Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCache

Sangeun Chae, Wonbae Kim, Daegyu Han, Jeongmin Kim, Beomseok Nam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

LSM-Tree-based key-value stores such as HBase, RocksDB, and Cassandra use a fixed data block size. In this study, we show that using a fixed block size can lead to unnecessary read amplification and cache pollution. To address this issue, we propose a dynamic data block size control method to store small key-values in small data blocks and large key-values in large data blocks to minimize disk I/Os. However, using small data blocks for small key-values can result in performance issues due to increased disk seeks. To mitigate this problem, we implement a two-level cache system, which involves a lower level conventional BlockCache for storing larger, coarse-grained data blocks and an upper level cache, KVCache, for storing smaller, fine-grained key-value pairs. Our experiments show that the dynamic data block size control and fine-grained KVCache help effectively reduce read amplification and improve read performance in HBase.

Original languageEnglish
Title of host publication39th Annual ACM Symposium on Applied Computing, SAC 2024
PublisherAssociation for Computing Machinery
Pages1495-1503
Number of pages9
ISBN (Electronic)9798400702433
DOIs
StatePublished - 8 Apr 2024
Event39th Annual ACM Symposium on Applied Computing, SAC 2024 - Avila, Spain
Duration: 8 Apr 202412 Apr 2024

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference39th Annual ACM Symposium on Applied Computing, SAC 2024
Country/TerritorySpain
CityAvila
Period8/04/2412/04/24

Keywords

  • key-value stores
  • log-structured merge tree

Fingerprint

Dive into the research topics of 'Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCache'. Together they form a unique fingerprint.

Cite this