Research on Performance Improvement of Vision Transformer Model Based on BEiT | |
---|---|
學年 | 113 |
學期 | 2 |
發表日期 | 2025-06-25 |
作品名稱 | Research on Performance Improvement of Vision Transformer Model Based on BEiT |
作品名稱(其他語言) | |
著者 | Zhe-Wei Liu;Chii-Jen Chen |
作品所屬單位 | |
出版者 | |
會議名稱 | The International Conference on Recent Advancements in Computing in AI, IoT and Computer Engineering Technology (CICET 2025) |
會議地點 | New Taipei, Taiwan |
摘要 | Vision Transformer (ViT) has demonstrated exceptional performance in image classification tasks across large-scale datasets. However, its application in domain-specific or small-scale datasets remains a challenge. This research explores an alternative approach to image patch generation, replacing the fixed-size patch mechanism in ViT with semantic-aware segmentation using the Segment Anything Model (SAM). We focus on applying this technique to datasets such as marine biology, animals, and plants, where semantic consistency plays a more critical role. The segmented patches are compared to the conventional 16×16 patches used in ViT to evaluate their potential to enhance semantic representation. Preliminary results suggest that using SAM-based patches can introduce better localized and meaningful features, providing a foundation for performance enhancement in downstream tasks. |
關鍵字 | Vision Transformer;BEiT;Semantic Segmentation;Small Datasets;Self-Supervised Learning |
語言 | en_US |
收錄於 | |
會議性質 | 國際 |
校內研討會地點 | 淡水校園 |
研討會時間 | 20250625~20250627 |
通訊作者 | |
國別 | TWN |
公開徵稿 | |
出版型式 | |
出處 | |
相關連結 |
機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/127769 ) |