TY - GEN
T1 - NLP-Fast
T2 - 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
AU - Kim, Joonsung
AU - Hur, Suyeon
AU - Lee, Eunbok
AU - Lee, Seungho
AU - Kim, Jangwoo
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Emerging natural language processing (NLP) models have become more complex and bigger to provide more sophisticated NLP services. Accordingly, there is also a strong demand for scalable and flexible computer infrastructure to support these large-scale, complex, and diverse NLP models. However, existing proposals cannot provide enough scalability and flexibility as they neither identify nor optimize a wide spectrum of performance-critical operations appearing in recent NLP models and only focus on optimizing specific operations. In this paper, we propose NLP-Fast, a novel system solution to accelerate a wide spectrum of large-scale NLP models. NLP-Fast mainly consists of two parts: (1) NLP-Perf: an in-depth performance analysis tool to identify critical operations in emerging NLP models and (2) NLP-Opt: three end-to-end optimization techniques to accelerate the identified performance-critical operations on various hardware platforms (e.g., CPU, GPU, FPGA). In this way, NLP-Fast can accelerate various types of NLP models on different hardware platforms by identifying their critical operations through NLP-Perf and applying the NLP-Opt's holistic optimizations. We evaluate NLP-Fast on CPU, GPU, and FPGA, and the overall throughputs are increased by up to 2.92×, 1.59×, and 4.47× over each platform's baseline. We release NLP-Fast to the community so that users are easily able to conduct the NLP-Fast's analysis and apply NLP-Fast's optimizations for their own NLP applications.
AB - Emerging natural language processing (NLP) models have become more complex and bigger to provide more sophisticated NLP services. Accordingly, there is also a strong demand for scalable and flexible computer infrastructure to support these large-scale, complex, and diverse NLP models. However, existing proposals cannot provide enough scalability and flexibility as they neither identify nor optimize a wide spectrum of performance-critical operations appearing in recent NLP models and only focus on optimizing specific operations. In this paper, we propose NLP-Fast, a novel system solution to accelerate a wide spectrum of large-scale NLP models. NLP-Fast mainly consists of two parts: (1) NLP-Perf: an in-depth performance analysis tool to identify critical operations in emerging NLP models and (2) NLP-Opt: three end-to-end optimization techniques to accelerate the identified performance-critical operations on various hardware platforms (e.g., CPU, GPU, FPGA). In this way, NLP-Fast can accelerate various types of NLP models on different hardware platforms by identifying their critical operations through NLP-Perf and applying the NLP-Opt's holistic optimizations. We evaluate NLP-Fast on CPU, GPU, and FPGA, and the overall throughputs are increased by up to 2.92×, 1.59×, and 4.47× over each platform's baseline. We release NLP-Fast to the community so that users are easily able to conduct the NLP-Fast's analysis and apply NLP-Fast's optimizations for their own NLP applications.
KW - Architecture
KW - Computation/Dataflow optimization
KW - Natural Language Processing (NLP)
KW - Parallel algorithm
UR - https://www.scopus.com/pages/publications/85125736429
U2 - 10.1109/PACT52795.2021.00013
DO - 10.1109/PACT52795.2021.00013
M3 - Conference contribution
AN - SCOPUS:85125736429
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 75
EP - 89
BT - Proceedings - 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
A2 - Lee, Jaejin
A2 - Cohen, Albert
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 September 2021 through 29 September 2021
ER -