Skip to main navigation Skip to search Skip to main content

NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models

  • Joonsung Kim
  • , Suyeon Hur
  • , Eunbok Lee
  • , Seungho Lee
  • , Jangwoo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emerging natural language processing (NLP) models have become more complex and bigger to provide more sophisticated NLP services. Accordingly, there is also a strong demand for scalable and flexible computer infrastructure to support these large-scale, complex, and diverse NLP models. However, existing proposals cannot provide enough scalability and flexibility as they neither identify nor optimize a wide spectrum of performance-critical operations appearing in recent NLP models and only focus on optimizing specific operations. In this paper, we propose NLP-Fast, a novel system solution to accelerate a wide spectrum of large-scale NLP models. NLP-Fast mainly consists of two parts: (1) NLP-Perf: an in-depth performance analysis tool to identify critical operations in emerging NLP models and (2) NLP-Opt: three end-to-end optimization techniques to accelerate the identified performance-critical operations on various hardware platforms (e.g., CPU, GPU, FPGA). In this way, NLP-Fast can accelerate various types of NLP models on different hardware platforms by identifying their critical operations through NLP-Perf and applying the NLP-Opt's holistic optimizations. We evaluate NLP-Fast on CPU, GPU, and FPGA, and the overall throughputs are increased by up to 2.92×, 1.59×, and 4.47× over each platform's baseline. We release NLP-Fast to the community so that users are easily able to conduct the NLP-Fast's analysis and apply NLP-Fast's optimizations for their own NLP applications.

Original languageEnglish
Title of host publicationProceedings - 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
EditorsJaejin Lee, Albert Cohen
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages75-89
Number of pages15
ISBN (Electronic)9781665442787
DOIs
StatePublished - 2021
Externally publishedYes
Event30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021 - Virtual, Onliine, United States
Duration: 26 Sep 202129 Sep 2021

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2021-September
ISSN (Print)1089-795X

Conference

Conference30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
Country/TerritoryUnited States
CityVirtual, Onliine
Period26/09/2129/09/21

Keywords

  • Architecture
  • Computation/Dataflow optimization
  • Natural Language Processing (NLP)
  • Parallel algorithm

Fingerprint

Dive into the research topics of 'NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models'. Together they form a unique fingerprint.

Cite this