TY - JOUR
T1 - EM-KDE
T2 - A locality-aware job scheduling policy with distributed semantic caches
AU - Eom, Youngmoon
AU - Hwang, Deukyeon
AU - Lee, Junyong
AU - Moon, Jonghwan
AU - Shin, Minho
AU - Nam, Beomseok
N1 - Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/6/23
Y1 - 2015/6/23
N2 - In modern query processing systems, the caching facilities are distributed and scale with the number of servers. To maximize the overall system throughput, the distributed system should balance the query loads among servers and also leverage cached results. In particular, leveraging distributed cached data is becoming more important as many systems are being built by connecting many small heterogeneous machines rather than relying on a few high-performance workstations. Although many query scheduling policies exist such as round-robin and load-monitoring, they are not sophisticated enough to both balance the load and leverage cached results. In this paper, we propose distributed query scheduling policies that take into account the dynamic contents of distributed caching infrastructure and employ statistical prediction methods into query scheduling policy. We employ the kernel density estimation derived from recent queries and the well-known exponential moving average (EMA) in order to predict the query distribution in a multi-dimensional problem space that dynamically changes. Based on the estimated query distribution, the front-end scheduler assigns incoming queries so that query workloads are balanced and cached results are reused. Our experiments show that the proposed query scheduling policy outperforms existing policies in terms of both load balancing and cache hit ratio.
AB - In modern query processing systems, the caching facilities are distributed and scale with the number of servers. To maximize the overall system throughput, the distributed system should balance the query loads among servers and also leverage cached results. In particular, leveraging distributed cached data is becoming more important as many systems are being built by connecting many small heterogeneous machines rather than relying on a few high-performance workstations. Although many query scheduling policies exist such as round-robin and load-monitoring, they are not sophisticated enough to both balance the load and leverage cached results. In this paper, we propose distributed query scheduling policies that take into account the dynamic contents of distributed caching infrastructure and employ statistical prediction methods into query scheduling policy. We employ the kernel density estimation derived from recent queries and the well-known exponential moving average (EMA) in order to predict the query distribution in a multi-dimensional problem space that dynamically changes. Based on the estimated query distribution, the front-end scheduler assigns incoming queries so that query workloads are balanced and cached results are reused. Our experiments show that the proposed query scheduling policy outperforms existing policies in terms of both load balancing and cache hit ratio.
KW - Distributed scheduling
KW - Distributed semantic cache
KW - Locality-aware scheduling
KW - Parallel multi-dimensional range query
UR - https://www.scopus.com/pages/publications/84934963746
U2 - 10.1016/j.jpdc.2015.06.002
DO - 10.1016/j.jpdc.2015.06.002
M3 - Article
AN - SCOPUS:84934963746
SN - 0743-7315
VL - 83
SP - 119
EP - 132
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -