Publications


DoubleChecker: Efficient Sound and Precise Atomicity Checking

Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014), Edinburgh, June 2014

Paper:   PDF

Source code:   Available for download from the Jikes RVM Research Archive

Abstract:

Atomicity is a key correctness property that allows programmers to reason about code regions in isolation. However, programs often fail to enforce atomicity correctly, leading to atomicity violations that are difficult to detect. Dynamic program analysis can detect atomicity violations based on an atomicity specification, but existing approaches slow programs substantially.

This paper presents DoubleChecker, a novel sound and precise atomicity checker whose key insight lies in its use of two new cooperating dynamic analyses. Its imprecise analysis tracks cross-thread dependences soundly but imprecisely with significantly better performance than a fully precise analysis. Its precise analysis is more expensive but only needs to process a subset of the execution identified as potentially involved in atomicity violations by the imprecise analysis. If DoubleChecker operates in single-run mode, the two analyses execute in the same program run, which guarantees soundness and precision but requires logging program accesses to pass from the imprecise to the precise analysis. In multi-run mode, the first program run executes only the imprecise analysis, and a second run executes both analyses. Multi-run mode trades accuracy for performance; each run of multi-run mode outperforms single-run mode, but can potentially miss violations.

We have implemented DoubleChecker and an existing state-of-the-art atomicity checker called Velodrome in a high-performance Java virtual machine. DoubleChecker's single-run mode significantly outperforms Velodrome, while still providing full soundness and precision. DoubleChecker's multi-run mode improves performance further, without significantly impacting soundness in practice. These results suggest that DoubleChecker's approach is a promising direction for improving the performance of dynamic atomicity checking over prior work.



Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support

Man Cao, Minjia Zhang, and Michael D. Bond

Workshop on Determinism and Correctness in Parallel Programming (WoDet 2014), Salt Lake City, March 2014

Paper:   PDF

Talk (by Man Cao):   PPTX   PDF

Abstract:

It is notoriously challenging to achieve parallel software systems that are both scalable and reliable. Parallel runtime support—such as multithreaded record & replay, data race and atomicity violation detectors, transactional memory, and support for stronger memory models—helps achieve these goals, but existing commodity solutions slow programs substantially in order to capture (track or control) the program's cross-thread dependences accurately. Capturing cross-thread dependences using "pessimistic" synchronization slows every program access, while "optimistic" synchronization allows for lightweight instrumentation of most accesses but dramatically slows accesses involved in cross-thread dependences.

This paper introduces (1) a hybrid of pessimistic and optimistic synchronization and (2) an adaptive policy that enables fine-grained switching between pessimistic and optimistic synchronization based on program behavior. The adaptive policy uses online profiling and a cost–benefit model to inform its decisions. We design a dependence recorder on top of our approach to demonstrate its feasibility as a framework for efficient parallel runtime support.

We have implemented our approach in a high-performance Java virtual machine and show that it outperforms parallel runtime support based solely on pessimistic or optimistic synchronization. These results show the potential for adaptive, hybrid synchronization for efficient parallel runtime support in commodity systems.



Octet: Capturing and Controlling Cross-Thread Dependences Efficiently

Michael D. Bond, Milind Kulkarni, Man Cao, Minjia Zhang, Meisam Fathi Salmi, Swarnendu Biswas, Aritra Sengupta, and Jipeng Huang

ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2013), Indianapolis, IN, USA, October 2013

Paper:   PDF

Talk:   PPTX   PDF

Source code:

Our implementation in a JVM is available for download from the Jikes RVM Research Archive

Chris Stone from Harvey Mudd has implemented Octet for C++11 and made it available here

Abstract:
Parallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ensuring concurrency correctness properties—atomicity, sequential consistency, and multithreaded determinism—because all existing approaches are impractical. Dynamic, software-based approaches slow programs by up to an order of magnitude because capturing and controlling cross-thread dependences (i.e., conflicting accesses to shared memory) requires synchronization at virtually every access to potentially shared memory.

This paper introduces a new software-based concurrency control mechanism called Octet that soundly captures cross-thread dependences and can be used to build dynamic analyses for concurrency correctness. Octet achieves low overheads by tracking the locality state of each potentially shared object. Non-conflicting accesses conform to the locality state and require no synchronization; only conflicting accesses require a state change and heavyweight synchronization. This optimistic tradeoff leads to significant efficiency gains in capturing cross-thread dependences: a prototype implementation of Octet in a high-performance Java virtual machine slows real-world concurrent programs by only 26% on average. A dependence recorder, suitable for record & replay, built on top of Octet adds an additional 5% overhead on average. These results suggest that Octet can provide a foundation for developing low-overhead analyses that check and enforce concurrency correctness.



Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management

Jipeng Huang and Michael D. Bond

ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2013), Indianapolis, IN, USA, October 2013

Paper:   PDF

Talk (by Jipeng Huang):   PPTX

Source code:   Available for download from the Jikes RVM Research Archive

Abstract:

State-of-the-art dynamic bug detectors such as data race and memory leak detectors report program locations that are likely causes of bugs. However, programmers need more than static program locations to understand the behavior of increasingly complex and concurrent software. Dynamic calling context provides additional information, but it is expensive to record calling context frequently, e.g., at every read and write. Context-sensitive dynamic analyses can build and maintain a calling context tree (CCT) to track calling context—but in order to reuse existing nodes, CCT-based approaches require an expensive lookup.

This paper introduces a new approach for context sensitivity that avoids this expensive lookup. The approach uses a new data structure called the calling context uptree (CCU) that adds low overhead by avoiding the lookup and instead allocating a new node for each context. A key contribution is that the approach can mitigate the costs of allocating many nodes by extending tracing garbage collection (GC): GC collects unused CCU nodes naturally and efficiently, and we extend GC to merge duplicate nodes lazily.

We implement our CCU-based approach in a high-performance Java virtual machine and integrate it with a staleness-based memory leak detector and happens-before data race detector, so they can report context-sensitive program locations that cause bugs. We show that the CCU-based approach, in concert with an extended GC, provides a compelling alternative to CCT-based approaches for adding context sensitivity to dynamic analyses.



LarkTM: Efficient, Strongly Atomic Software Transactional Memory

Minjia Zhang, Jipeng Huang, Man Cao, and Michael D. Bond

Technical report (November 2013):   PDF

Abstract:

Software transactional memory provides an appealing alternative to locks by improving programmability, reliability, and scalability without relying on custom hardware. Existing STMs are impractical because they add high overhead and/or provide weak semantics.

This paper introduces a novel, strongly atomic STM called LarkTM that adds low overhead and scales well on low-contention workloads. LarkTM provides these features efficiently by building on a concurrency control mechanism that avoids expensive concurrency control except at conflicting memory accesses. To preserve soundness, conflicting accesses require a coordination protocol to detect transactional conflicts and resolve them.

An implementation and evaluation in a Java virtual machine demonstrate LarkTM adds low single-thread overhead and often scales well enough to outperform single-thread execution not using STM. Furthermore, we implement an existing high-performance, strongly atomic STM and find that LarkTM compares favorably. Finally, we design and implement a hybrid of LarkTM and the prior STM that provides the best overall performance.



Hybrid Static–Dynamic Analysis for Region Serializability

Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni

Technical report (November 2013):   PDF

Abstract:

Shared-memory parallel programs have many possible behaviors due to the many ways in which conflicting memory accesses can interleave. This plethora of behaviors complicates programmers' understanding of software, makes it more difficult to test and verify software, and leads to atomicity and sequential consistency (SC) violations. As prior work argues, providing region serializability (RS) makes software more reliable and simplifies programming and debugging. However, existing approaches that provide RS have serious limitations such as relying on custom hardware support or adding high overhead.

This paper introduces an approach called EnfoRSer that enforces RS of statically bounded regions. EnfoRSer is a hybrid static–dynamic analysis that performs static compiler analysis to transform regions, and dynamic analysis to detect and resolve conflicts at run time. We design, implement, and evaluate three distinct approaches for making regions execute atomically: one enforces mutual exclusion of regions, another executes idempotent regions, and the third executes regions speculatively. EnfoRSer provides RS of statically bounded regions completely in software without precluding compiler optimizations within regions. By demonstrating commodity support for RS with reasonable overheads, we demonstrate its potential as an always-on execution model.



Tracking Conflicting Accesses Efficiently for Software Record and Replay

Michael D. Bond and Milind Kulkarni

Technical report (February 2012):   PDF

Abstract:

Record and replay, which records a multithreaded program's execution in one run and reproduces it deterministically in a second run, is useful for program debugging, fault detection and analysis. The key challenge in multithreaded record and replay is ensuring that conflicting, cross-thread accesses to shared variables are properly detected, recorded and reproduced. Numerous solutions have been proposed in both hardware and software to track these cross-thread accesses, but to date all general-purpose software solutions suffer from high overhead or have serious limitations. This paper introduces Roctet, an approach for performing software-only deterministic record and replay. Roctet is built on top of a novel dynamic analysis infrastructure, Octet, which can detect at low overhead any cross-thread dependences during execution by exploiting the fact that most accesses, even of shared objects, are not part of cross-thread dependences. We implement Octet and Roctet in a JVM and show they add low overhead for several multithreaded applications. We also show that our implementation of Roctet can successfully replay recorded executions when it can successfully control or ignore extraneous sources of nondeterminism in the VM, libraries, and system.


LeakChaser: Helping Programmers Narrow Down Causes of Memory Leaks

Guoqing "Harry" Xu, Michael D. Bond, Feng Qin, and Atanas Rountev

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), San Jose, CA, USA, June 2011

Paper:   PDF

Source code:   Available for download from the Jikes RVM Research Archive

Abstract:

In large programs written in managed languages such as Java and C#, holding unnecessary references often results in memory leaks and bloat, degrading significantly their run-time performance and scalability. Despite the existence of many leak detectors for such languages, these detectors often target low-level objects; as a result, their reports contain many false warnings and lack sufficient semantic information to help diagnose problems. This paper introduces a specification-based technique called LeakChaser that can not only capture precisely the unnecessary references leading to leaks, but also explain, with high-level semantics, why these references become unnecessary.

At the heart of LeakChaser is a three-tier approach that uses varying levels of abstraction to assist programmers with different skill levels and code familiarity to find leaks. At the highest tier of the approach, the programmer only needs to specify the boundaries of coarse-grained activities, referred to as transactions. The tool automatically infers liveness properties of these transactions, by monitoring the execution, in order to find unnecessary references. Diagnosis at this tier can be performed by any programmer after inspecting the APIs and basic modules of a program, without understanding of the detailed implementation of these APIs. At the middle tier, the programmer can introduce application-specific semantic information by specifying properties for the transactions. At the lowest tier of the approach is a liveness checker that does not rely on higher-level semantic information, but rather allows a programmer to assert lifetime relationships for pairs of objects. This task could only be performed by skillful programmers who have a clear understanding of data structures and algorithms in the program.

We have implemented LeakChaser in Jikes RVM and used it to help us diagnose several real-world leaks. The implementation incurs a reasonable overhead for debugging and tuning. Our case studies indicate that the implementation is powerful in guiding programmers with varying code familiarity to find the root causes of several memory leaks – even someone who had not studied a leaking program can quickly find the cause after using LeakChaser's iterative process that infers and checks properties with different levels of semantic information.



A Security Policy Oracle: Detecting Security Holes Using Multiple API Implementations

Varun Srivastava, Michael D. Bond, Kathryn S. McKinley, and Vitaly Shmatikov

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), San Jose, CA, USA, June 2011

Paper:   PDF

Abstract:

Even experienced developers struggle to implement security policies correctly. For example, despite 15 years of development, standard Java libraries still suffer from missing and incorrectly applied permission checks, which enable untrusted applications to execute native calls or modify private class variables without authorization. Previous techniques for static verification of authorization enforcement rely on manually specified policies or attempt to infer the policy by code-mining. Neither approach guarantees that the policy used for verification is correct.

In this paper, we exploit the fact that many modern APIs have multiple, independent implementations. Our flow- and context-sensitive analysis takes as input an API, multiple implementations thereof, and the definitions of security checks and security-sensitive events. For each API entry point, the analysis computes the security policies enforced by the checks before security-sensitive events such as native method calls and API returns, compares these policies across implementations, and reports the differences. Unlike code-mining, this technique finds missing checks even if they are part of a rare pattern. Security-policy differencing has no intrinsic false positives: implementations of the same API must enforce the same policy, or at least one of them is wrong!

Our analysis finds 20 new, confirmed security vulnerabilities and 11 interoperability bugs in the Sun, Harmony, and Classpath implementations of the Java Class Library, many of which were missed by prior analyses. These problems manifest in 499 entry points in these mature, well-studied libraries. Multiple API implementations are proliferating due to cloud-based software services and standardization of library interfaces. Comparing software implementations for consistency is a new approach to discovering "deep" bugs in them.



Pacer: Proportional Detection of Data Races

Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2010), Toronto, June 2010

Paper:   PDF   (or extended tech report with additional proof details)

Talk:   PPTX (version 2007)   PPT (versions 97-2003)

Source code:

Available for download from the Jikes RVM Research Archive

Li, Srisa-an, and Dwyer built on our implementation for their OOPSLA 2011 paper.

Xie and Xue used our implementation for their CGO 2011 paper.

Abstract:

Data races indicate serious concurrency bugs such as order, atomicity, and sequential consistency violations. Races are difficult to find and fix, often manifesting only after deployment. The frequency and unpredictability of these bugs will only increase as software adds parallelism to exploit multicore hardware. Unfortunately, sound and precise race detectors slow programs by factors of eight or more and do not scale to large numbers of threads.

This paper presents a precise, low-overhead sampling-based data race detector called Pacer. Pacer makes a proportionality guarantee: it detects any race at a rate equal to the sampling rate, by finding races whose first access occurs during a global sampling period. During sampling, Pacer tracks all accesses using the dynamically sound and precise FastTrack algorithm. In non-sampling periods, Pacer discards sampled access information that cannot be part of a reported race, and Pacer simplifies tracking of the happens-before relationship, yielding near-constant, instead of linear, overheads. Experimental results confirm our theoretical guarantees. Pacer reports races in proportion to the sampling rate. Its time and space overheads scale with the sampling rate, and sampling rates of 1-3% yield overheads low enough to consider in production software. The resulting system provides a "get what you pay for" approach that is suitable for identifying real, hard-to-reproduce races in deployed systems.



Breadcrumbs: Efficient Context Sensitivity for Dynamic Bug Detection Analyses

Michael D. Bond, Graham Z. Baker, and Samuel Z. Guyer

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2010), Toronto, June 2010

Paper:   PDF

Talk (by Samuel Guyer):   PPTX (version 2007)   PPT (versions 97-2003)

Source code:

Available for download from the Jikes RVM Research Archive
Abstract:

Calling context – the set of active methods on the stack – is critical for understanding the dynamic behavior of large programs. Dynamic program analysis tools, however, are almost exclusively context insensitive because of the prohibitive cost of representing calling contexts at run time. Deployable dynamic analyses, in particular, have been limited to reporting only static program locations.

This paper presents Breadcrumbs, an efficient technique for recording and reporting dynamic calling contexts. It builds on an existing technique for computing a compact (one word) encoding of each calling context that client analyses can use in place of a program location. The key feature of our system is a search algorithm that can reconstruct a calling context from its encoding using only a static call graph and a small amount of dynamic information collected at cold (infrequently executed) callsites. Breadcrumbs requires no offline training or program modifications, and handles all language features, including dynamic class loading.

We use Breadcrumbs to add context sensitivity to two dynamic analyses: a data-race detector and an analysis for diagnosing null pointer exceptions. On average, it adds 10% to 20% runtime overhead, depending on a tunable parameter that controls how much dynamic information is collected. Collecting less information lowers the overhead, but can result in a search space explosion. In some cases this causes reconstruction to fail, but in most cases Breadcrumbs produces non-trivial calling contexts that have the potential to significantly improve both the precision of the analyses and the quality of the bug reports.



Efficient, Context-Sensitive Detection of Real-World Semantic Attacks

Michael D. Bond, Varun Srivastava, Kathryn S. McKinley, and Vitaly Shmatikov

ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS 2010), Toronto, June 2010

Paper:   PDF

Talk:   PPTX (version 2007)   PPT (versions 97-2003)

Source code:

Available for download from the Jikes RVM Research Archive
Abstract:

Software developers are increasingly choosing memory-safe languages. As a result, semantic vulnerabilities – omitted security checks, misconfigured security policies, and other software design errors – are supplanting memory-corruption exploits as the primary cause of security violations. Semantic attacks are difficult to detect because they violate program semantics, rather than language semantics. This paper presents Pecan, a new dynamic anomaly detector. Pecan identifies unusual program behavior using history sensitivity and depth-limited context sensitivity. Prior work on context-sensitive anomaly detection relied on stack-walking, which incurs overheads of 50% to over 200%. By contrast, the average overhead of Pecan is 5%, which is low enough for practical deployment. We evaluate Pecan on four representative real-world attacks from security vulnerability reports. These attacks exploit subtle bugs in Java applications and libraries, using legal program executions that nevertheless violate programmers' expectations. Anomaly detection must balance precision and sensitivity: high sensitivity leads to many benign behaviors appearing anomalous (false positives), while low sensitivity may miss attacks. With application-specific tuning, Pecan efficiently tracks depth-limited context and history and reports few false positives.



Leak Pruning

Michael D. Bond and Kathryn S. McKinley

14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), Washington, DC, March 2009

Paper:   PDF   PS

Talk:   PPT (version 2007)   PPT (versions 97-2003)   PDF

Source code:

Available for download from the Jikes RVM Research Archive
Abstract:

Managed languages improve programmer productivity with type safety and garbage collection, which eliminate memory errors such as dangling pointers, double frees, and buffer overflows. However, because garbage collection uses reachability to over-approximate live objects, programs may still leak memory if programmers forget to eliminate the last reference to an object that will not be used again. Leaks slow programs by increasing collector workload and frequency. Growing leaks eventually crash programs.

This paper introduces leak pruning, which keeps programs running by predicting and reclaiming leaked objects at run time. It predicts dead objects and reclaims them based on observing data structure usage patterns. Leak pruning preserves semantics because it waits for heap exhaustion before reclaiming objects and poisons references to objects it reclaims. If the program later tries to access a poisoned reference, the virtual machine (VM) throws an error. We show leak pruning has low overhead in a Java VM and evaluate it on 10 leaking programs. Leak pruning does not help two programs, executes five substantial programs 1.6-81X longer, and executes three programs, including a leak in Eclipse, for at least 24 hours. In the worst case, leak pruning defers fatal errors. In the best case, it keeps leaky programs running with preserved semantics and consistent throughput.



Laminar: Practical Fine-Grained Decentralized Information Flow Control

Indrajit Roy, Donald E. Porter, Michael D. Bond, Kathryn S. McKinley, and Emmett Witchel

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, June 2009

Paper:   PDF   PS

Talk (by Indrajit Roy):   PPT

Source code:

The JVM and OS source are available for download from the Jikes RVM Research Archive.
Abstract:

Decentralized information flow control (DIFC) is a promising model for writing programs with powerful, end-to-end security guarantees. Current DIFC systems that run on commodity hardware can be broadly categorized into two types: language-level and operating system-level DIFC. Language level solutions provide no guarantees against security violations on system resources, like files and sockets. Operating system solutions can mediate accesses to system resources, but are inefficient at monitoring the flow of information through fine-grained program data structures.

This paper describes Laminar, the first system to implement decentralized information flow control using a single set of abstractions for OS resources and heap-allocated objects. Programmers express security policies by labeling data with secrecy and integrity labels, and then access the labeled data in lexically scoped security regions. Laminar enforces the security policies specified by the labels at runtime. Laminar is implemented using a modified Java virtual machine and a new Linux security module. This paper shows that security regions ease incremental deployment and limit dynamic security checks, allowing us to retrofit DIFC policies on four application case studies. Replacing the applications' ad-hoc security policies changes less than 10% of the code, and incurs performance overheads from 1% to 56%. Whereas prior DIFC systems only support limited types of multithreaded programs, Laminar supports a more general class of multithreaded DIFC programs that can access heterogeneously labeled data.



Diagnosing and Tolerating Bugs in Deployed Systems

Michael David Bond

Doctoral Dissertation, Department of Computer Sciences, The University of Texas at Austin, December 2008

ACM SIGPLAN Outstanding Doctoral Dissertation Award (citation)

Document:

Official PDF
PDF with 2 pages on a page
PDF with 4 pages on a page
Abstract:

Deployed software is never free of bugs. These bugs cause software to fail, wasting billions of dollars and sometimes causing injury or death. Bugs are pervasive in modern software, which is increasingly complex due to demand for features, extensibility, and integration of components. Complete validation and exhaustive testing are infeasible for substantial software systems, and therefore deployed software exhibits untested and unanalyzed behaviors.

Software behaves differently after deployment due to different environments and inputs, so developers cannot find and fix all bugs before deploying software, and they cannot easily reproduce post-deployment bugs outside of the deployed setting. This dissertation argues that post-deployment is a compelling environment for diagnosing and tolerating bugs, and it introduces a general approach called post-deployment debugging. Techniques in this class are efficient enough to go unnoticed by users and accurate enough to find and report the sources of errors to developers. We demonstrate that they help developers find and fix bugs and help users get more functionality out of failing software.

To diagnose post-deployment failures, programmers need to understand the program operations – control and data flow – responsible for failures. Prior approaches for widespread tracking of control and data flow often slow programs by two times or more and increase memory usage significantly, making them impractical for online use. We present novel techniques for representing control and data flow that add modest overhead while still providing diagnostic information directly useful for fixing bugs. The first technique, probabilistic calling context (PCC), provides low-overhead context sensitivity to dynamic analyses that detect new or anomalous deployed behavior. Second, Bell statistically correlates control flow with data, and it reconstructs program locations associated with data. We apply Bell to leak detection, where it tracks and reports program locations responsible for real memory leaks. The third technique, origin tracking, tracks the originating program locations of unusable values such as null references, by storing origins in place of unusable values. These origins are cheap to track and are directly useful for diagnosing real-world null pointer exceptions.

Post-deployment diagnosis helps developers find and fix bugs, but in the meantime, users need help with failing software. We present techniques that tolerate memory leaks, which are particularly difficult to diagnose since they have no immediate symptoms and may take days or longer to materialize. Our techniques effectively narrow the gap between reachability and liveness by providing the illusion that dead but reachable objects do not consume resources. The techniques identify stale objects not used in a while and remove them from the application and garbage collector's working set. The first technique, Melt, relocates stale memory to disk, so it can restore objects if the program uses them later. Growing leaks exhaust the disk eventually, and some embedded systems have no disk. Our second technique, leak pruning, addresses these limitations by automatically reclaiming likely leaked memory. It preserves semantics by waiting until heap exhaustion to reclaim memory – then intercepting program attempts to access reclaimed memory.

We demonstrate the utility and efficiency of post-deployment debugging on large, real-world programs – where they pinpoint bug causes and improve software availability. Post-deployment debugging efficiently exposes and exploits programming language semantics and opens up a promising direction for improving software robustness.



Tolerating Memory Leaks

Michael D. Bond and Kathryn S. McKinley

23rd Annual International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2008), Nashville, October 2008

Paper:   PDF   PS

Talk:   PPT (version 2007)   PPT (versions 97-2003)   PDF

Source code:

Available for download from the Jikes RVM Research Archive
Abstract:

Type safety and garbage collection in managed languages eliminate memory errors such as dangling pointers, double frees, and leaks of unreachable objects. Unfortunately, a program still leaks memory if it maintains references to objects it will never use again. Leaked objects decrease program locality and increase garbage collection frequency and workload. A growing leak will eventually exhaust memory and crash the program.

This paper introduces a leak tolerance approach called Melt that safely eliminates performance degradations and crashes due to leaks of dead but reachable objects in managed languages, given sufficient disk space to hold leaking objects. Melt (1) identifies stale objects that the program is not accessing; (2) segregates in-use and stale objects by storing stale objects to disk; and (3) preserves safety by activating stale objects if the program subsequently accesses them. We design and build a prototype implementation of Melt in a Java VM and show it adds overhead low enough for production systems. Whereas existing VMs grind to a halt and then crash on programs with leaks, Melt keeps many of these programs running much longer without significantly degrading performance. Melt provides users the illusion of a fixed leak and gives developers more time to fix leaky programs.



Probabilistic Calling Context

Michael D. Bond and Kathryn S. McKinley

22nd Annual International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2007), Montreal, October 2007

Paper:   PDF   PS

Talk:   PPT   PDF

Source code:

Available for download from the Jikes RVM Research Archive

Jones and Ryder used our implementation for context-sensitive allocation sites in their ISMM 2008 paper.

Abstract:

Calling context enhances program understanding and dynamic analyses by providing a rich representation of program location. Compared to imperative programs, object-oriented programs use more interprocedural and less intraprocedural control flow, increasing the importance of context sensitivity for analysis. However, prior online methods for computing calling context, such as stack-walking or maintaining the current location in a calling context tree, are expensive in time and space. This paper introduces a new online approach called probabilistic calling context (PCC) that continuously maintains a probabilistically unique value representing the current calling context. For millions of unique contexts, a 32-bit PCC value has few conflicts. Computing the PCC value adds 3% average overhead to a Java virtual machine. PCC is well-suited to clients that detect new or anomalous behavior since PCC values from training and production runs can be compared easily to detect new context-sensitive behavior; clients that query the PCC value at every system call, Java utility call, and Java API call add 0-9% overhead on average. PCC adds space overhead proportional to the distinct contexts stored by the client (one word per context). Our results indicate PCC is efficient and accurate enough to use in deployed software for residual testing, bug detection, and intrusion detection.



Tracking Bad Apples: Reporting the Origin of Null and Undefined Value Errors

Michael D. Bond, Nicholas Nethercote, Stephen W. Kent, Samuel Z. Guyer, and Kathryn S. McKinley

22nd Annual International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2007), Montreal, October 2007

Paper:   PDF   PS

Talk:   PPT   PDF

Source code:

Both implementations are publicly available:
Bug suite:
We've made available 12 Java null pointer exceptions as the Bad Apples Suite.
Abstract:

Programs sometimes crash due to unusable values, for example, when Java and C# programs dereference null pointers and when C and C++ programs use undefined values to affect program behavior. A stack trace produced on such a crash identifies the effect of the unusable value, not its cause, and is often not much help to the programmer.

This paper presents efficient origin tracking of unusable values; it shows how to record where these values come into existence, correctly propagate them, and report them if they cause an error. The key idea is value piggybacking: when the original program stores an unusable value, value piggybacking instead stores origin information in the spare bits of the unusable value. Modest compiler support alters the program to propagate these modified values through operations such as assignments and comparisons. We evaluate two implementations: the first tracks null pointer origins in a JVM, and the second tracks undefined value origins in a memory-checking tool built with Valgrind. These implementations show that origin tracking via value piggybacking is fast and often useful, and in the Java case, has low enough overhead for use in a production environment.



Correcting the Dynamic Call Graph Using Control Flow Constraints

Byeongcheol Lee, Kevin Resnick, Michael D. Bond, and Kathryn S. McKinley

16th International Conference on Compiler Construction (CC 2007), Braga, Portugal, March 2007

Paper:   PDF   PS   (A longer tech report version includes proofs and algorithms.)

Talk (by Byeongcheol Lee):   PPT

Source code:

Available for download from the Jikes RVM Research Archive
Abstract:

To reason about programs, dynamic optimizers and analysis tools use sampling to collect a dynamic call graph (DCG). However, sampling has not achieved high accuracy with low runtime overhead. As object-oriented programmers compose increasingly complex programs, inaccurate call graphs will inhibit analysis and optimizations. This paper demonstrates how to use static and dynamic control flow graph (CFG) constraints to improve the accuracy of the DCG. We introduce the frequency dominator (FDOM), a novel CFG relation that extends the dominator relation to expose static relative execution frequencies of basic blocks. We combine conservation of flow and dynamic CFG basic block profiles to further improve the accuracy of the DCG. Together these approaches add minimal overhead (1%) and achieve 85% accuracy compared to a perfect call graph for SPEC JVM98 and DaCapo benchmarks. Compared to sampling alone, accuracy improves by 12 to 36%. These results demonstrate that static and dynamic control-flow information offer accurate information for efficiently improving the DCG.



Bell: Bit-Encoding Online Memory Leak Detection

Michael D. Bond and Kathryn S. McKinley

12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII), San Jose, October 2006

Paper:   PDF   PS

Talk:   PPT   PDF

Source code:

Available for download from the Jikes RVM Research Archive

Tang, Gao, and Qin modified our implementation for their USENIX 2008 paper.

Abstract:

Memory leaks compromise availability and security by crippling performance and crashing programs. Leaks are difficult to diagnose because they have no immediate symptoms. Online leak detection tools benefit from storing and reporting per-object sites (e.g., allocation sites) for potentially leaking objects. In programs with many small objects, per-object sites add high space overhead, limiting their use in production environments.

This paper introduces Bit-Encoding Leak Location (Bell), a statistical approach that encodes per-object sites to a single bit per object. A bit loses information about a site, but given sufficient objects that use the site and a known, finite set of possible sites, Bell uses brute-force decoding to recover the site with high accuracy.

We use this approach to encode object allocation and last-use sites in Sleigh, a new leak detection tool. Sleigh detects stale objects (objects unused for a long time) and uses Bell decoding to report their allocation and last-use sites. Our implementation steals four unused bits in the object header and thus incurs no per-object space overhead. Sleigh's instrumentation adds 29% execution time overhead, which adaptive profiling reduces to 11%. Sleigh's output is directly useful for finding and fixing leaks in SPEC JBB2000 and Eclipse, although sufficiently many objects must leak before Bell decoding can report sites with confidence. Bell is suitable for other leak detection approaches that store per-object sites, and for other problems amenable to statistical per-object metadata.



Continuous Path and Edge Profiling

Michael D. Bond and Kathryn S. McKinley

38th International Symposium on Microarchitecture (MICRO-38), Barcelona, November 2005

Paper:   PDF   PS

Talk:   PPT   PDF

Source code:

Available for download from the Jikes RVM Research Archive

D'Elia and Demetrescu modified our path profiling implementation for their OOPSLA 2013 paper.

Abstract:

Microarchitectures increasingly rely on dynamic optimization to improve performance in ways that are difficult or impossible for ahead-of-time compilers. Dynamic optimizers in turn require continuous, portable, low cost, and accurate control-flow profiles to inform their decisions, but prior approaches have struggled to meet these goals simultaneously.

This paper presents PEP, a hybrid instrumentation and sampling approach for continuous path and edge profiling that is efficient, accurate, and portable. PEP uses a subset of Ball-Larus path profiling to identify paths with low overhead, and uses sampling to mitigate the expense of storing paths. PEP further reduces overhead by using profiling to guide instrumentation placement. PEP improves profile accuracy with a modified version of Arnold-Grove sampling. The resulting system has 1.2% average and 4.3% maximum overhead, 94% path profile accuracy, and 96% edge profile accuracy on a set of Java benchmarks.



Practical Path Profiling for Dynamic Optimizers

Michael D. Bond and Kathryn S. McKinley

3rd International Symposium on Code Generation and Optimization (CGO-3), San Jose, March 2005

Paper:   PDF   PS

Talk:   PPT   PDF

Source code:

Available as part of the Scale compiler.

Vaswani, Nori, and Chilimbi modified our path profiling implementation for their POPL 2007 and FSE 2007 papers.

Abstract:

Modern processors are hungry for instructions. To satisfy them, compilers need to find and optimize execution paths across multiple basic blocks. Path profiles provide this context, but their high overhead has so far limited their use by dynamic compilers. We present new techniques for low overhead online practical path profiling (PPP). Following targeted path profiling (TPP), PPP uses an edge profile to simplify path profile instrumentation (profile-guided profiling). PPP improves over prior work by (1) reducing the amount of profiling instrumentation on cold paths and paths that the edge profile predicts well and (2) reducing the cost of the remaining instrumentation.

Experiments in an ahead-of-time compiler perform edge profile-guided inlining and unrolling prior to path profiling instrumentation. These transformations are faithful to staged optimization, and create longer, harder to predict paths. We introduce the branch-flow metric to measure path flow as a function of branch decisions, rather than weighting all paths equally as in prior work. On SPEC2000, PPP maintains high accuracy and coverage, but has only 5% overhead on average (ranging from -3% to 13%), making it appealing for use by dynamic compilers.



Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems

Rahul Joshi, Michael D. Bond, and Craig Zilles

2nd International Symposium on Code Generation and Optimization (CGO-2), Palo Alto, March 2004

Best student presenter

Paper:   PDF

Talk:   PPT   PDF

Source code:

Subsumed by the Practical Path Profiling source code.
Abstract:

In this paper, we present a technique for reducing the overhead of collecting path profiles in the context of a dynamic optimizer. The key idea to our approach, called Targeted Path Profiling (TPP), is to use an edge profile to simplify the collection of a path profile. This notion of profile-guided profiling is a natural fit for dynamic optimizers, which typically optimize the code in a series of stages.

TPP is an extension to the Ball-Larus Efficient Path Profiling algorithm. Its increased efficiency comes from two sources: (i) reducing the number of potential paths by not enumerating paths with cold edges, allowing array accesses to be substituted for more expensive hash table lookups, and (ii) not instrumenting regions where paths can be unambiguously derived from an edge profile. Our results suggest that on average the overhead of profile collection can be reduced by half (SPEC95) to almost two-thirds (SPEC2000) relative to the Ball-Larus algorithm with minimal impact on the information collected.