Department of Computer Science and Engineering

Guest Speaker

Button DBToaster: Aggressive Query Compilation for Incremental Processing in Update-Intensive Applications


Yanif Ahmad
Department of Computer Science
Cornell University

Feb 25 2010 3:30PM
480 Dreese Labs
All interested parties are invited to attend.
Refreshments will be served prior to talk.

Abstract:

The dilemma of high total cost of ownership, or limited scale out ability offered by commercial databases has started a trend for many communities to develop their own nimble, lightweight data management tools, as seen with mapreduce and key-value stores. With today's data flood, update-intensive applications clearly adhere to this trend, since to this date, databases have a notoriously poor reputation for handling updates efficiently. Update-intensive applications, such as algorithmic trading on order book data, personal status feeds (e.g., Facebook, Twitter) and compute cloud management, are currently addressed in relational data management systems by incremental view maintenance and stream processing techniques, yet these techniques are ill-suited since they involve significant repetition of work, or exploit assumptions about updates that restrict their usefulness.

I introduce DBToaster, a novel SQL compilation framework that reconsiders the foundations, and program structure of state-of-the-art query processors to generate lightweight, high-performance query engines. In this talk I will focus on query compilation to incrementally process update-intensive applications. While view maintenance and stream processing evaluate queries with highly-optimized relational algebra operators, in contrast, DBToaster uses map data structures, resulting in very simple, efficient query processing programs. I will present DBToaster's novel aggressive, recursive compilation technique, which determines maps to maintain by repeatedly simplifying queries based on query input deltas. In experimental results, DBToaster outperforms both a low footprint view maintenance algorithm, and a commercial database engine by 1-4 orders of magnitude.

I will also discuss ongoing work on Cumulus, a massive-scale online query processor based on DBToaster's extremely simple intermediate language of map maintenance, a language that is embarrassingly parallel and reflects the goals of achieving scalability through simplicity. Finally, I will conclude with near-term research directions on scalable bulk processing with DBToaster, and mid-term directions in numerical query processing.

Bio:
Yanif Ahmad is a postdoctoral associate in the Database Group at Cornell University with Prof. Christoph Koch, having received his Ph.D. from Brown University in January 2009 under the supervision of Prof. Ugur Cetintemel. His research focuses on data stream processing and distributed data management. Yanif is the recipient of an IBM Ph.D. fellowship, a Best Research Paper award at the ICDE 2008 conference, and a Best Demonstration Award at SIGMOD 2005, and has interned at both IBM Almaden and Microsoft Research.

Host: Srini Parthasarathy

* Yanif Ahmad is a CSE faculty candidate

OSU logo