研究報告

學年 100
學期 1
出版(發表)日期 2011-08-01
作品名稱 結合特徵排序的改良式浮動序列特徵擷取演算法
作品名稱(其他語言) Modified Sequential Floating Search Algorithm with a Novel Ranking Method
著者 周建興; 趙于翔
單位 淡江大學電機工程學系
描述 計畫編號:NSC100-2221-E032-069
 研究期間:20110801~20120731
 研究經費:462,000
委託單位 行政院國家科學委員會
摘要 In issues of pattern classification, choice of a suitable feature selection method is often the key to success. A successful feature selection not only raises the classification accuracy but also extracts the critical features that users are concerned with. For instance, in the analysis research of DNA sequence, feature selection enables us to locate the segments on the sequence that may lead to certain diseases and types of amino acids; and to select the gene that may lead to certain diseases from the data of microarray. As another example in text categorization issues, feature selection enables us to extract the keywords that contributed to classification. In addition, using feature selection we can not only select the features that users are most interested in but also save time while training and testing of the classifier, as well as memory space for data storage. Of the various feature selection methods, sequential floating search (SFS) is well known and widely adopted. In this project, we propose a feature selection method combining feature ranking and SFS. In feature ranking, we adopted the new idea of false feature to rank features based on their importance, and applied SFS to features that are less important and of lower rank. By doing so, we not only overcame issues with the original SFS but also extracted more critical feature subsets. 在模式識別的研究領域中,特徵擷取演算法是一個十分重要的研究主 題。例如在DNA 序列的分析研究中,透過特徵擷取可以找到序列中可能導 致疾病的段落位置或是氨基酸種類;在microarray 的資料中選擇可能導致某 種疾病的基因;又譬如在文件分類的問題中,找出真正有助於分類的關鍵 字詞為何。此外,透過特徵擷取,不但可以選出使用者感興趣的特徵,還 可以減少分類器訓練與測試所花費的時間與運算量,以及資料儲存時所需 的容量。 在眾多的特徵擷取方法中,Sequential floating search (SFS) 可說是相 當知名且被廣泛使用的方法。在此計畫中,我們提出一個結合feature ranking 與SFS 的特徵擷取方法。在feature ranking 階段,我們使用假特徵 (false feature)的觀念,來排序出(rank)特徵的重要度,然後挑選出排名後重 要性較低的特徵,進行SFS 演算法。這樣的作法,一方面可以克服original SFS 演算法可能遭遇到的問題,另一方面還能夠擷取出更關鍵性的子特徵 集合。
關鍵字 特徵選取; 特徵排序; 假特徵; 機器學習; k最近鄰居法; feature selection; feature ranking; false feature; machine Learning; k-NN
語言 zh_TW
相關連結

機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/76915 )

機構典藏連結