TY - GEN
T1 - Mitigating YARN container overhead with input splits
AU - Kim, Wonbae
AU - Choi, Young Ri
AU - Nam, Beomseok
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/10
Y1 - 2017/7/10
N2 - We analyze YARN container overhead and present early results of reducing its overhead by dynamically adjusting the input split size. YARN is designed as a generic resource manager that decouples programming models from resource management infrastructures. We demonstrate that YARN's generic design incurs significant overhead because each con-tainer must perform various initialization steps, including authentication. To reduce container overhead without changing the existing YARN framework significantly, we propose leverag-ing the input split, which is the logical representation of physical HDFS blocks. With input splits, we can combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wave and reducing the number of containers and their initialization overhead. Experimental results shows that we can avoid recurring container overhead by selecting the right size for input splits and reducing the number of containers.
AB - We analyze YARN container overhead and present early results of reducing its overhead by dynamically adjusting the input split size. YARN is designed as a generic resource manager that decouples programming models from resource management infrastructures. We demonstrate that YARN's generic design incurs significant overhead because each con-tainer must perform various initialization steps, including authentication. To reduce container overhead without changing the existing YARN framework significantly, we propose leverag-ing the input split, which is the logical representation of physical HDFS blocks. With input splits, we can combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wave and reducing the number of containers and their initialization overhead. Experimental results shows that we can avoid recurring container overhead by selecting the right size for input splits and reducing the number of containers.
UR - https://www.scopus.com/pages/publications/85027470415
U2 - 10.1109/CCGRID.2017.106
DO - 10.1109/CCGRID.2017.106
M3 - Conference contribution
AN - SCOPUS:85027470415
T3 - Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
SP - 204
EP - 207
BT - Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
Y2 - 14 May 2017 through 17 May 2017
ER -