TY - JOUR
T1 - DPIM
T2 - A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core
AU - Kim, Donghyuk
AU - Kim, Jae Young
AU - Cho, Hyunjun
AU - Yoo, Seungjae
AU - Lee, Sukjin
AU - Yune, Sungwoong
AU - Yang, Sejeong
AU - Jeong, Hoichang
AU - Park, Keonhee
AU - Lee, Ki Soo
AU - Lee, Jongchan
AU - Han, Chanheum
AU - Koo, Gunmo
AU - Han, Yuli
AU - Kim, Jaejin
AU - Kim, Jaemin
AU - Jason Lee, Kyuho Jason
AU - Chae, Joo Hyung
AU - Cho, Kunhee
AU - Kim, Joo Young
N1 - Publisher Copyright:
© 1966-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention layers; 2) lack of sparsity exploitation for transformers; and 3) limited PIM macro capacity and low cell density, which degrades on-chip data reuse and increases external memory access (EMA). This article presents DPIM, a novel 2T1C eDRAM-based transformer-in-memory chip that addresses these challenges through three key innovations: 1) a sparsity-aware quantization (SAQ) scheme that significantly increases bit-slice sparsity in both activation and weight data, achieving ratios of 83.3% and 88.4%, respectively, with minimal accuracy loss; 2) a heterogeneous PIM core capable of efficiently handling both sparse and dense matrix multiplications (MMs); and 3) a high-density 2T1C eDRAM cell with a density of 1.38 Mb/mm2, enabling large-capacity PIM macros. By integrating these features, DPIM achieves improved computational efficiency and reduced EMA with enhanced on-chip data reuse. The DPIM chip, fabricated using 28-nm CMOS technology, achieves a throughput of 3.03–12.12 TOPS and an energy efficiency of 4.84–19.36 TOPS/W, all measured across INT8 and INT4 operations, respectively. It achieves a throughput density of 0.55 TOPS/mm2 with INT8 operation. With a total macro size of 4608 kb, the chip occupies a die area of 20.25 mm2 and operates at frequencies from 50 to 285 MHz with a supply voltage of 0.85–1.0 V. The DPIM successfully executes BERT-Large on the general language understanding evaluation (GLUE) dataset. Its macro density is 1413 kb/mm2, and the resulting density figure-of-merit (FoM) (macro density x throughput density) is 1.6 x – 115.8 x higher than previous works, representing a significant advancement in hardware design for efficient transformer processing.
AB - Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention layers; 2) lack of sparsity exploitation for transformers; and 3) limited PIM macro capacity and low cell density, which degrades on-chip data reuse and increases external memory access (EMA). This article presents DPIM, a novel 2T1C eDRAM-based transformer-in-memory chip that addresses these challenges through three key innovations: 1) a sparsity-aware quantization (SAQ) scheme that significantly increases bit-slice sparsity in both activation and weight data, achieving ratios of 83.3% and 88.4%, respectively, with minimal accuracy loss; 2) a heterogeneous PIM core capable of efficiently handling both sparse and dense matrix multiplications (MMs); and 3) a high-density 2T1C eDRAM cell with a density of 1.38 Mb/mm2, enabling large-capacity PIM macros. By integrating these features, DPIM achieves improved computational efficiency and reduced EMA with enhanced on-chip data reuse. The DPIM chip, fabricated using 28-nm CMOS technology, achieves a throughput of 3.03–12.12 TOPS and an energy efficiency of 4.84–19.36 TOPS/W, all measured across INT8 and INT4 operations, respectively. It achieves a throughput density of 0.55 TOPS/mm2 with INT8 operation. With a total macro size of 4608 kb, the chip occupies a die area of 20.25 mm2 and operates at frequencies from 50 to 285 MHz with a supply voltage of 0.85–1.0 V. The DPIM successfully executes BERT-Large on the general language understanding evaluation (GLUE) dataset. Its macro density is 1413 kb/mm2, and the resulting density figure-of-merit (FoM) (macro density x throughput density) is 1.6 x – 115.8 x higher than previous works, representing a significant advancement in hardware design for efficient transformer processing.
KW - 2T1C cell
KW - heterogeneous processor
KW - processing-in-memory (PIM)
KW - quantization
KW - transformer
UR - https://www.scopus.com/pages/publications/105018036698
U2 - 10.1109/JSSC.2025.3607826
DO - 10.1109/JSSC.2025.3607826
M3 - Article
AN - SCOPUS:105018036698
SN - 0018-9200
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
ER -