期刊論文

王銀添 WANG YIN-TIEN

個人網頁

Attention Distribution-Aware Softmax for NPU-Accelerated On-Device Inference of LLMs: An Edge-Oriented Approximation Design
學年	114
學期	2
出版（發表）日期	2026-03-20
作品名稱	Attention Distribution-Aware Softmax for NPU-Accelerated On-Device Inference of LLMs: An Edge-Oriented Approximation Design
作品名稱（其他語言）
著者	Sanoop Sadheerthan; Min-Jie Hsu; Chih-Hsiang Huang; Yin-Tien Wang
單位
出版者
著錄名稱、卷期、頁數	Electronics 15(6), p. 1312
摘要	Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate e^x using uniform piecewise polynomials to enable O(1) SIMD indexing, but this wastes computation by applying high-degree arithmetic indiscriminately in every segment. Conversely, fully adaptive approaches maximize statistical fidelity but introduce pipeline stalls due to comparator-based boundary search. To bridge this gap, we propose an attention distribution-aware softmax that uses Particle Swarm Optimization (PSO) to define non-uniform segments and variable polynomial degrees, prioritizing finer granularity and lower arithmetic complexity in attention-dense regions. To ensure efficiency, we snap boundaries into a 128-bin LUT, enabling O(1) retrieval of segment parameters without branching. Inference measurements show that this favors low-degree execution, minimizing exp-kernel overhead. Using TinyLlama-1.1B-Chat as a testbed, the proposed weighted design reduces cycles per call exp kernel (CPC) by 18.5% versus an equidistant uniform Degree-4 baseline and 13.1% versus uniform Degree-3, while preserving ranking fidelity. These results show that grid-snapped, variable-degree approximation can improve softmax efficiency while largely preserving attention ranking fidelity, enabling accurate edge LLM inference.
關鍵字	Large Language Models (LLMs); softmax approximation; Multi-Objective Optimization (MOO); Edge AI; distribution-aware optimization; Neural Processing Units (NPUs) acceleration
語言	en
ISSN	2079-9292
期刊性質	國外
收錄於	SCI
產學合作	國內
通訊作者	王銀添
審稿制度	是
國別	CHE
公開徵稿
出版型式	,電子版
相關連結	機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/129056 )
SDGS	優質教育

Tamkang University Teacher ePortfolio System - All Rights Reserved © by OIS, TKU

系統維護: 資訊處曾江安
聯絡電話: (02)26215656 分機 3484

GoogleAnalytics

教師歷程問與答:
Q: 開放給何種身份使用?
A: 目前開放給淡江大學教師(含專、兼任)使用。
Q: 資料不完整(正確)?
A: 教師歷程系統介接自七大系統，並包含某些「CSV匯入」；分別有不同的資料交換方式與頻率。。
Q: 我無法登入?
A: 請確認您的單一登入帳號密碼無誤。若需進一步協助，請洽詢資訊處分機:3484

適用以下瀏覽器: Microsoft IE8 / IE9 Mozilla Firefox Google Chrome Apple Safari

期刊論文 王銀添 WANG YIN-TIEN 個人網頁

Tamkang University Teacher ePortfolio System - All Rights Reserved © by OIS, TKU

期刊論文

王銀添 WANG YIN-TIEN

個人網頁