期刊論文
| 學年 | 114 |
|---|---|
| 學期 | 2 |
| 出版(發表)日期 | 2026-03-20 |
| 作品名稱 | Attention Distribution-Aware Softmax for NPU-Accelerated On-Device Inference of LLMs: An Edge-Oriented Approximation Design |
| 作品名稱(其他語言) | |
| 著者 | Sanoop Sadheerthan; Min-Jie Hsu; Chih-Hsiang Huang; Yin-Tien Wang |
| 單位 | |
| 出版者 | |
| 著錄名稱、卷期、頁數 | Electronics 15(6), p. 1312 |
| 摘要 | Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate e^x using uniform piecewise polynomials to enable O(1) SIMD indexing, but this wastes computation by applying high-degree arithmetic indiscriminately in every segment. Conversely, fully adaptive approaches maximize statistical fidelity but introduce pipeline stalls due to comparator-based boundary search. To bridge this gap, we propose an attention distribution-aware softmax that uses Particle Swarm Optimization (PSO) to define non-uniform segments and variable polynomial degrees, prioritizing finer granularity and lower arithmetic complexity in attention-dense regions. To ensure efficiency, we snap boundaries into a 128-bin LUT, enabling O(1) retrieval of segment parameters without branching. Inference measurements show that this favors low-degree execution, minimizing exp-kernel overhead. Using TinyLlama-1.1B-Chat as a testbed, the proposed weighted design reduces cycles per call exp kernel (CPC) by 18.5% versus an equidistant uniform Degree-4 baseline and 13.1% versus uniform Degree-3, while preserving ranking fidelity. These results show that grid-snapped, variable-degree approximation can improve softmax efficiency while largely preserving attention ranking fidelity, enabling accurate edge LLM inference. |
| 關鍵字 | Large Language Models (LLMs); softmax approximation; Multi-Objective Optimization (MOO); Edge AI; distribution-aware optimization; Neural Processing Units (NPUs) acceleration |
| 語言 | en |
| ISSN | 2079-9292 |
| 期刊性質 | 國外 |
| 收錄於 | SCI |
| 產學合作 | 國內 |
| 通訊作者 | 王銀添 |
| 審稿制度 | 是 |
| 國別 | CHE |
| 公開徵稿 | |
| 出版型式 | ,電子版 |
| 相關連結 |
機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/129056 ) |