Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
學年 113
學期 1
出版(發表)日期 2024-10-07
作品名稱 Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
作品名稱(其他語言)
著者 Weichun Wong; Yachun Li; Shihan Li
單位
出版者
著錄名稱、卷期、頁數 IEEE Access 12, p.172548-172561
摘要 In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures.
關鍵字
語言 en_US
ISSN 2169-3536
期刊性質 國外
收錄於 SCI
產學合作
通訊作者
審稿制度
國別 USA
公開徵稿
出版型式 ,電子版
相關連結

機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/127619 )