DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

Donghyuk Kim, Jae Young Kim, Hyunjun Cho, Seungjae Yoo, Sukjin Lee, Sungwoong Yune, Sejeong Yang, Hoichang Jeong, Keonhee Park, Ki Soo Lee, Jongchan Lee, Chanheum Han, Gunmo Koo, Yuli Han, Jaejin Kim, Jaemin Kim, Kyuho Jason Jason Lee, Joo Hyung Chae, Kunhee Cho, Joo Young Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention layers; 2) lack of sparsity exploitation for transformers; and 3) limited PIM macro capacity and low cell density, which degrades on-chip data reuse and increases external memory access (EMA). This article presents DPIM, a novel 2T1C eDRAM-based transformer-in-memory chip that addresses these challenges through three key innovations: 1) a sparsity-aware quantization (SAQ) scheme that significantly increases bit-slice sparsity in both activation and weight data, achieving ratios of 83.3% and 88.4%, respectively, with minimal accuracy loss; 2) a heterogeneous PIM core capable of efficiently handling both sparse and dense matrix multiplications (MMs); and 3) a high-density 2T1C eDRAM cell with a density of 1.38 Mb/mm2, enabling large-capacity PIM macros. By integrating these features, DPIM achieves improved computational efficiency and reduced EMA with enhanced on-chip data reuse. The DPIM chip, fabricated using 28-nm CMOS technology, achieves a throughput of 3.03–12.12 TOPS and an energy efficiency of 4.84–19.36 TOPS/W, all measured across INT8 and INT4 operations, respectively. It achieves a throughput density of 0.55 TOPS/mm2 with INT8 operation. With a total macro size of 4608 kb, the chip occupies a die area of 20.25 mm2 and operates at frequencies from 50 to 285 MHz with a supply voltage of 0.85–1.0 V. The DPIM successfully executes BERT-Large on the general language understanding evaluation (GLUE) dataset. Its macro density is 1413 kb/mm2, and the resulting density figure-of-merit (FoM) (macro density x throughput density) is 1.6 x – 115.8 x higher than previous works, representing a significant advancement in hardware design for efficient transformer processing.

Original languageEnglish
JournalIEEE Journal of Solid-State Circuits
DOIs
StateAccepted/In press - 2025

Keywords

  • 2T1C cell
  • heterogeneous processor
  • processing-in-memory (PIM)
  • quantization
  • transformer

Fingerprint

Dive into the research topics of 'DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core'. Together they form a unique fingerprint.

Cite this