Skip to main navigation Skip to search Skip to main content

Cloud Reamer: Enabling Inference Services in Training Clusters

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

CPU cores in GPU servers are often underutilized during DNN training. Co-locating CPU-based inference tasks with DNN training offers an opportunity to utilize these idle CPU cycles. However, three technical challenges must be addressed: Avoiding disruption to training workloads, meeting different performance requirements for online and offline inference, and swiftly adjusting inference configurations based on available resources. This paper proposes Cloud Reamer, a scheme to colocate training and inference tasks on GPU servers, optimizing unused CPU cycles without disrupting training. Cloud Reamer prioritizes training tasks to minimize interference. For online inference, it allocates cores to ensure predictable performance, while for offline inference, it uses all available cores to maximize throughput. Cloud Reamer enhances online and offline inference performance by dynamically adjusting configurations based on surplus CPU resources. Evaluations show that Cloud Reamer improves inference throughput with minimal impact on training, maintaining training interference below 3. 2%. It meets latency requirements for 46% more requests for online inference and achieves a 61x throughput increase for offline inference compared to conventional methods.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 32nd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2024
PublisherIEEE Computer Society
ISBN (Electronic)9798331531300
DOIs
StatePublished - 2024
Event32nd IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2024 - Krakow, Poland
Duration: 21 Oct 202423 Oct 2024

Publication series

NameProceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS
ISSN (Print)1526-7539

Conference

Conference32nd IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2024
Country/TerritoryPoland
CityKrakow
Period21/10/2423/10/24

Keywords

  • cloud computing
  • co-location
  • deep neural networks
  • inference
  • interference
  • training

Fingerprint

Dive into the research topics of 'Cloud Reamer: Enabling Inference Services in Training Clusters'. Together they form a unique fingerprint.

Cite this