TY - GEN
T1 - Cloud Reamer
T2 - 32nd IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2024
AU - Khan, Osama
AU - Park, Gwanjong
AU - Yu, Junyeol
AU - Seo, Euiseong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - CPU cores in GPU servers are often underutilized during DNN training. Co-locating CPU-based inference tasks with DNN training offers an opportunity to utilize these idle CPU cycles. However, three technical challenges must be addressed: Avoiding disruption to training workloads, meeting different performance requirements for online and offline inference, and swiftly adjusting inference configurations based on available resources. This paper proposes Cloud Reamer, a scheme to colocate training and inference tasks on GPU servers, optimizing unused CPU cycles without disrupting training. Cloud Reamer prioritizes training tasks to minimize interference. For online inference, it allocates cores to ensure predictable performance, while for offline inference, it uses all available cores to maximize throughput. Cloud Reamer enhances online and offline inference performance by dynamically adjusting configurations based on surplus CPU resources. Evaluations show that Cloud Reamer improves inference throughput with minimal impact on training, maintaining training interference below 3. 2%. It meets latency requirements for 46% more requests for online inference and achieves a 61x throughput increase for offline inference compared to conventional methods.
AB - CPU cores in GPU servers are often underutilized during DNN training. Co-locating CPU-based inference tasks with DNN training offers an opportunity to utilize these idle CPU cycles. However, three technical challenges must be addressed: Avoiding disruption to training workloads, meeting different performance requirements for online and offline inference, and swiftly adjusting inference configurations based on available resources. This paper proposes Cloud Reamer, a scheme to colocate training and inference tasks on GPU servers, optimizing unused CPU cycles without disrupting training. Cloud Reamer prioritizes training tasks to minimize interference. For online inference, it allocates cores to ensure predictable performance, while for offline inference, it uses all available cores to maximize throughput. Cloud Reamer enhances online and offline inference performance by dynamically adjusting configurations based on surplus CPU resources. Evaluations show that Cloud Reamer improves inference throughput with minimal impact on training, maintaining training interference below 3. 2%. It meets latency requirements for 46% more requests for online inference and achieves a 61x throughput increase for offline inference compared to conventional methods.
KW - cloud computing
KW - co-location
KW - deep neural networks
KW - inference
KW - interference
KW - training
UR - https://www.scopus.com/pages/publications/85215093153
U2 - 10.1109/MASCOTS64422.2024.10786549
DO - 10.1109/MASCOTS64422.2024.10786549
M3 - Conference contribution
AN - SCOPUS:85215093153
T3 - Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS
BT - Proceedings - 2024 IEEE 32nd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2024
PB - IEEE Computer Society
Y2 - 21 October 2024 through 23 October 2024
ER -