Using large multimodal models to predict outfit compatibility
學年 113
學期 2
出版(發表)日期 2025-07-01
作品名稱 Using large multimodal models to predict outfit compatibility
作品名稱(其他語言)
著者 Chia-Ling Chang , Yen-Liang Chen , Dao-Xuan Jiang
單位
出版者
著錄名稱、卷期、頁數 Volume 194, 114457
摘要 Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations. This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research
關鍵字 Large language models; Large multi-modal models; Outfit compatibility; Outfit recommendation
語言 zh_TW
ISSN 01679236
期刊性質 國外
收錄於 SCI SSCI
產學合作
通訊作者
審稿制度
國別 USA
公開徵稿
出版型式 ,電子版