TY - GEN
T1 - CAVA
T2 - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
AU - Hwang, Eunji
AU - Kim, Hyungoo
AU - Nam, Beomseok
AU - Choi, Young Ri
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/13
Y1 - 2018/7/13
N2 - Running big data analytics frameworks in the cloud is becoming increasingly important, but their resource managers in the current form are not designed to consider virtualized environments. In this work, we investigate various levels of data locality in a virtualized environment, ranging from rack locality to memory locality. Exploiting extra fine-grained levels of data locality in a virtualized environment, our memory locality-aware scheduling algorithm effectively increases the cache hit ratio and thereby reduces network traffic and disk I/O. However, a high cache hit ratio does not necessarily imply a shorter job execution time in MapReduce applications. To resolve this issue, we develop the Cache-Affinity and Virtualization-Aware (CAVA) resource manager, which measures the cache affinity of MapReduce applications at runtime and efficiently manages distributed in-memory caches of a limited size by assigning high priority to applications that have high cache affinity. The proposed memory locality-aware scheduling algorithm is also integrated into the CAVA resource manager. Our extensive experimental study shows that CAVA exhibits overall good performance over various workloads composed of multiple big data analytics applications by considering the fine-grained data locality levels in virtualized clusters and by efficiently using scarce memory resources.
AB - Running big data analytics frameworks in the cloud is becoming increasingly important, but their resource managers in the current form are not designed to consider virtualized environments. In this work, we investigate various levels of data locality in a virtualized environment, ranging from rack locality to memory locality. Exploiting extra fine-grained levels of data locality in a virtualized environment, our memory locality-aware scheduling algorithm effectively increases the cache hit ratio and thereby reduces network traffic and disk I/O. However, a high cache hit ratio does not necessarily imply a shorter job execution time in MapReduce applications. To resolve this issue, we develop the Cache-Affinity and Virtualization-Aware (CAVA) resource manager, which measures the cache affinity of MapReduce applications at runtime and efficiently manages distributed in-memory caches of a limited size by assigning high priority to applications that have high cache affinity. The proposed memory locality-aware scheduling algorithm is also integrated into the CAVA resource manager. Our extensive experimental study shows that CAVA exhibits overall good performance over various workloads composed of multiple big data analytics applications by considering the fine-grained data locality levels in virtualized clusters and by efficiently using scarce memory resources.
KW - Big Data Analytics
KW - Cache Affinity
KW - Cache Replacement Algorithm
KW - Hadoop Scheduling Algorithm
KW - Memory Locality
KW - Virtualized Clusters
UR - https://www.scopus.com/pages/publications/85050957815
U2 - 10.1109/CCGRID.2018.00017
DO - 10.1109/CCGRID.2018.00017
M3 - Conference contribution
AN - SCOPUS:85050957815
T3 - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
SP - 21
EP - 30
BT - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 May 2018 through 4 May 2018
ER -