Computer Science

Measuring the Effects of Energy Intermittency in Distributed Systems

Thomas J. Gaidus ’13

Sorting benchmarks have been developed for researchers to study and compare the performance of different hardware/software configurations using a set of predefined metrics. While these metrics and benchmarks do measure many useful characteristics of the data centers, their use cases are not realistic for what production data centers performing actual sorting operations may face. GreenSort is a new sorting benchmark that accounts for the fact that many data centers are moving towards powering themselves with renewable energy-sources that often provide an intermittent supply. When power is no longer assumed to be constant and plentiful, the cluster must perform different scheduling operations to keep the energy demand always below the supply. In the GreenSort benchmark, the systems performance, while being forced to stay under the power curve, is compared to its unencumbered sorting performance to determine the systems effectiveness in dealing with the intermittency. The more efficient a system is in dealing with the new, changing power supply, the more competitive that machine is in the GreenSort benchmark competition.
This thesis describes the details and parameters of the GreenSort benchmark, and provides a sample sorting implementation, called NapSort, to act as a reference for future competitors. The performance of NapSort using four power-saving algorithms is evaluated in the context of GreenSort to highlight the intended use of the benchmark.


Home Occupancy Detection and Prediction via Energy Disaggregation

Jennifer M. Gossels ’13

Excessive energy use is a serious problem. Although the topic is well-publicized and most people agree that “something needs to be done,” few homeowners are willing to do this “something” by making lasting changes to their behavior. Thus, any effective solution to reduce energy consumption must require minimal homeowner involvement. We would like to design automated energy-saving systems that can, for example, turn off the television and lower the heat set point when the homeowner is away. However, such systems rely on occupancy detection and prediction information. Hence, before we can implement these technologies, we need to develop techniques for detecting and predicting home occupancy.
We present a four-step process to detect and then predict home occupancy using only the home’s aggregate power data. Because overall power data are readily available, our work will allow homeowners who cannot be bothered to install complex sensor systems to take advantage of energy-saving mechanisms that depend on occupancy information. Using circuit-level data from one home, we begin by identifying 30 sets of appliances that commonly run together. We then train a classifier to assign unseen instances into one of these 30 classes without any circuit-level information. Third, we complete the detection stage of our project by mapping occupancy to each of the 30 classes based on their characteristic appliances. Finally, we use these occupancy detection data as inputs for an occupancy prediction algorithm. In the best case, we predict occupancy with 97.53% accuracy.


Implementing Online GreedyFuture

Donny Huang ’13

In this thesis, we provide a practical implementation of the theoretical algorithm online GreedyFuture, and test its empirical performance.


Prototype Support Vector Machines: Supervised Classification in Complex Datasets
April T. Shen ’13

Real-world machine learning datasets may be highly complex.  Data of a single class may be distributed irregularly throughout the feature space and measures of distance as a proxy for similarity can be unreliable. Classification learning algorithms for such datasets typically require model selection, which in practice is often an ad-hoc and time-consuming process that depends on assumptions about the structure of data.  To avoid this, I introduce the ensemble of prototype support vector machines (PSVMs).  This algorithm trains an ensemble of linear SVMs that are tuned to different regions of the feature space and thus are able to separate the space arbitrarily, reducing the need to decide what model to use for each dataset.  I also present experimental results demonstrating the efficacy of PSVMs in both noiseless and noisy datasets.


ShrinkWrap: Efficient Dynamic Race Detection for Array-Intensive Programs

James R. Wilcox ’13

We explore a new technique for efficient dynamic race detection on programs using arrays intensively. Standard techniques lead to redundant operations and redundant representations in many common cases. For these common cases, we design dynamic compression methods that eliminate this redundancy. Finally, we implement our techniques in a prototype tool called ShrinkWrap, which is built as an extension to a state-of-the-art precise dynamic race detector. We evaluate the performance and precision of ShrinkWrap on a suite of benchmark programs.
We show that our prototype can improve performance dramatically when the target program accesses arrays in a pattern we recognize. The vast majority of the accesses that must be checked by the underlying race detector can be eliminated on almost half of our benchmark programs. However, we also find that our prototype is not always as time efficient as one might expect given the number of accesses eliminated.