TY - GEN
T1 - VeloxDFS
T2 - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
AU - Ahn, Sunghwan
AU - Park, Hyeongjun
AU - Bolea Sanchez, V. A.
AU - Hwang, Deukyeon
AU - Kim, Wonbae
AU - Sussman, Alan
AU - Nam, Beomseok
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In this work, we design and implement VeloxDFS, a distributed file system for ETL MapReduce frameworks, which has compatible APIs with HDFS. VeloxDFS is a decentralized, consistent, hash-based file system that dynamically adjusts the size of partitioned blocks. Rather than the conventional static and coarse-grained partitioning scheme, VeloxDFS employs a fine-grained logical partitioning scheme and provides an abstraction of various sized blocks based on the I/O consumption rate. VeloxDFS avoids I/O contention and straggler problems by employing a block stream manager that coordinates the I/O requests of multiple tasks during runtime. By reducing the I/O contention of concurrent tasks, VeloxDFS enables overcommit scheduling that schedules a larger number of tasks than the available physical CPU cores, leveraging the OS scheduler to improve the computing resource utilization of a cluster. Our extensive performance study shows that VeloxDFS with the over-commit scheduling policy shows up to 1.7x higher job processing throughput than HDFS for multiple concurrent workloads.
AB - In this work, we design and implement VeloxDFS, a distributed file system for ETL MapReduce frameworks, which has compatible APIs with HDFS. VeloxDFS is a decentralized, consistent, hash-based file system that dynamically adjusts the size of partitioned blocks. Rather than the conventional static and coarse-grained partitioning scheme, VeloxDFS employs a fine-grained logical partitioning scheme and provides an abstraction of various sized blocks based on the I/O consumption rate. VeloxDFS avoids I/O contention and straggler problems by employing a block stream manager that coordinates the I/O requests of multiple tasks during runtime. By reducing the I/O contention of concurrent tasks, VeloxDFS enables overcommit scheduling that schedules a larger number of tasks than the available physical CPU cores, leveraging the OS scheduler to improve the computing resource utilization of a cluster. Our extensive performance study shows that VeloxDFS with the over-commit scheduling policy shows up to 1.7x higher job processing throughput than HDFS for multiple concurrent workloads.
KW - Distributed File System
KW - IO Load Balancing
KW - Storage Management
UR - https://www.scopus.com/pages/publications/85135742487
U2 - 10.1109/CCGrid54584.2022.00012
DO - 10.1109/CCGrid54584.2022.00012
M3 - Conference contribution
AN - SCOPUS:85135742487
T3 - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
SP - 31
EP - 40
BT - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
A2 - Fazio, Maria
A2 - Panda, Dhabaleswar K.
A2 - Prodan, Radu
A2 - Cardellini, Valeria
A2 - Kantarci, Burak
A2 - Rana, Omer
A2 - Villari, Massimo
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 May 2022 through 19 May 2022
ER -