Skip to main navigation Skip to search Skip to main content

VeloxDFS: Streaming Access to Distributed Datasets to Reduce Disk Seeks

  • Sunghwan Ahn
  • , Hyeongjun Park
  • , V. A. Bolea Sanchez
  • , Deukyeon Hwang
  • , Wonbae Kim
  • , Alan Sussman
  • , Beomseok Nam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we design and implement VeloxDFS, a distributed file system for ETL MapReduce frameworks, which has compatible APIs with HDFS. VeloxDFS is a decentralized, consistent, hash-based file system that dynamically adjusts the size of partitioned blocks. Rather than the conventional static and coarse-grained partitioning scheme, VeloxDFS employs a fine-grained logical partitioning scheme and provides an abstraction of various sized blocks based on the I/O consumption rate. VeloxDFS avoids I/O contention and straggler problems by employing a block stream manager that coordinates the I/O requests of multiple tasks during runtime. By reducing the I/O contention of concurrent tasks, VeloxDFS enables overcommit scheduling that schedules a larger number of tasks than the available physical CPU cores, leveraging the OS scheduler to improve the computing resource utilization of a cluster. Our extensive performance study shows that VeloxDFS with the over-commit scheduling policy shows up to 1.7x higher job processing throughput than HDFS for multiple concurrent workloads.

Original languageEnglish
Title of host publicationProceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
EditorsMaria Fazio, Dhabaleswar K. Panda, Radu Prodan, Valeria Cardellini, Burak Kantarci, Omer Rana, Massimo Villari
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages31-40
Number of pages10
ISBN (Electronic)9781665499569
DOIs
StatePublished - 2022
Event22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022 - Taormina, Italy
Duration: 16 May 202219 May 2022

Publication series

NameProceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022

Conference

Conference22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
Country/TerritoryItaly
CityTaormina
Period16/05/2219/05/22

Keywords

  • Distributed File System
  • IO Load Balancing
  • Storage Management

Fingerprint

Dive into the research topics of 'VeloxDFS: Streaming Access to Distributed Datasets to Reduce Disk Seeks'. Together they form a unique fingerprint.

Cite this