教師資料查詢 | 類別: 會議論文 | 教師: 宋立群 SUNG LI-CHUN (瀏覽個人網頁)

標題:Web document classification based on tagged-region progressive analysis
學年
學期
發表日期2004/12/15
作品名稱Web document classification based on tagged-region progressive analysis
作品名稱(其他語言)
著者Sung, Li-Chun; Chen, Meng-Chang; Kuo, Chin-Hwa
作品所屬單位淡江大學資訊工程學系
出版者
會議名稱Taipei and the International Computer Symposium 2004
會議地點臺北市, 臺灣
摘要In this paper, we propose an intelligent web document classification method, called TAgged-Region Progressive Analysis (TARPA). Instead of parsing the whole content of the web page while classifying a web document, TARPA parses the document into finer structured Tagged-Regions and extracts fewer and the most important regions to analyze and classify. If the few important tagged regions are not sufficient to allow TARPA to classify the document, other important regions and linked pages can be used for analysis progressively to enhance the classification performance. TARPA possesses two stages: learning stage and classification stage. The learning stage discriminates the importance of tag-pairs, and the classification stage follows the importance order of tag-pairs to analyze the document. As a result, TARPA can classify a web document using few contents while with higher classification rate and shorter processing time. Experiments show that 91% of the testing web documents can be correctly classified by only feeding the TARPA classifier with 40% to 50% of the document contents.
關鍵字Web categorization;Progressive analysis
語言英文
收錄於
會議性質國內
校內研討會地點
研討會時間20041215~20041217
通訊作者
國別中華民國
公開徵稿
出版型式
出處Proceedings of Taipei and the International Computer Symposium 2004, pp.259-264
相關連結
SDGs
Google+ 推薦功能,讓全世界都能看到您的推薦!