DEMB: Cache-aware scheduling for distributed query processing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Leveraging data in distributed caches for large scale query processing applications is becoming more important, given current trends toward building large scalable distributed systems by connecting multiple heterogeneous less powerful machines rather than purchasing expensive homogeneous and very powerful machines. As more servers are added to such clusters, more memory is available for caching data objects across the distributed machines. However the cached objects are dispersed and traditional query scheduling policies that take into account only load balancing do not effectively utilize the increased cache space. We propose a new multi-dimensional range query scheduling policy for distributed query processing frameworks, called DEMB, that employs a probability distribution estimation derived from recent queries. DEMB accounts for both load balancing and the availability of distributed cached objects to both improve the cache hit rate for queries and thereby decrease query turnaround time and throughput. We experimentally demonstrate that DEMB produces better query plans and lower query response times than other query scheduling policies.

Original languageEnglish
Title of host publicationJob Scheduling Strategies for Parallel Processing - 16th International Workshop, JSSPP 2012, Revised Selected Papers
PublisherSpringer Verlag
Pages16-35
Number of pages20
ISBN (Print)9783642358661
DOIs
StatePublished - 2013
Externally publishedYes
Event16th Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2012 - Shanghai, China
Duration: 25 May 201225 May 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7698 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2012
Country/TerritoryChina
CityShanghai
Period25/05/1225/05/12

Keywords

  • Data intensive computing
  • Distributed query scheduling
  • Multiple query optimization
  • Spatial clustering

Fingerprint

Dive into the research topics of 'DEMB: Cache-aware scheduling for distributed query processing'. Together they form a unique fingerprint.

Cite this