TR-17-2.pdf

``LSbM-tree: re-enabling buffer caching in data management for mixed reads   
and writes"

Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and  
Xiaodong Zhang

Proceedings of 37th International Conference on Distributed Computing
Systems (ICDCS'17), Atlanta, Georgia, USA, June 5-8, 2017.


Abstract

LSM-tree has been widely used in data management
production systems for write-intensive workloads. However, as
read and write workloads co-exist under LSM-tree, data accesses
can experience long latency and low throughput due to the
interferences to buffer caching from the compaction, a major and
frequent operation in LSM-tree. After a compaction, the existing
data blocks are reorganized and written to other locations on
disks. As a result, the related data blocks that have been loaded in
the buffer cache are invalidated since their referencing addresses
are changed, causing serious performance degradations.

In order to re-enable high-speed buffer caching during intensive
writes, we propose Log-Structured buffered-Merge tree
(simplified as LSbM-tree) by adding a compaction buffer on disks,
to minimize the cache invalidations on buffer cache caused by
compactions. The compaction buffer efficiently and adaptively
maintains the frequently visited data sets. In LSbM, strong
locality objects can be effectively kept in the buffer cache with
minimum or without harmful invalidations. With the help of a
small on-disk compaction buffer, LSbM achieves a high query
performance by enabling effective buffer caching, while retaining
all the merits of LSM-tree for write-intensive data processing,
and providing high bandwidth of disks for range queries. We
have implemented LSbM based on LevelDB. We show that
with a standard buffer cache and a hard disk, LSbM can
achieve 2x performance improvement over LevelDB. We have
also compared LSbM with other existing solutions to show its
strong effectiveness.