周建興 Chien-hsing Chou | 結合特徵排序的改良式浮動序列特徵擷取演算法

研究報告

學年	100
學期	1
出版（發表）日期	2011-08-01
作品名稱	結合特徵排序的改良式浮動序列特徵擷取演算法
作品名稱（其他語言）	Modified Sequential Floating Search Algorithm with a Novel Ranking Method
著者	周建興; 趙于翔
單位	淡江大學電機工程學系
描述	計畫編號：NSC100-2221-E032-069 研究期間：20110801~20120731 研究經費：462,000
委託單位	行政院國家科學委員會
摘要	In issues of pattern classification, choice of a suitable feature selection method is often the key to success. A successful feature selection not only raises the classification accuracy but also extracts the critical features that users are concerned with. For instance, in the analysis research of DNA sequence, feature selection enables us to locate the segments on the sequence that may lead to certain diseases and types of amino acids; and to select the gene that may lead to certain diseases from the data of microarray. As another example in text categorization issues, feature selection enables us to extract the keywords that contributed to classification. In addition, using feature selection we can not only select the features that users are most interested in but also save time while training and testing of the classifier, as well as memory space for data storage. Of the various feature selection methods, sequential floating search (SFS) is well known and widely adopted. In this project, we propose a feature selection method combining feature ranking and SFS. In feature ranking, we adopted the new idea of false feature to rank features based on their importance, and applied SFS to features that are less important and of lower rank. By doing so, we not only overcame issues with the original SFS but also extracted more critical feature subsets. 在模式識別的研究領域中，特徵擷取演算法是一個十分重要的研究主題。例如在DNA 序列的分析研究中，透過特徵擷取可以找到序列中可能導致疾病的段落位置或是氨基酸種類；在microarray 的資料中選擇可能導致某種疾病的基因；又譬如在文件分類的問題中，找出真正有助於分類的關鍵字詞為何。此外，透過特徵擷取，不但可以選出使用者感興趣的特徵，還可以減少分類器訓練與測試所花費的時間與運算量，以及資料儲存時所需的容量。在眾多的特徵擷取方法中，Sequential floating search (SFS) 可說是相當知名且被廣泛使用的方法。在此計畫中，我們提出一個結合feature ranking 與SFS 的特徵擷取方法。在feature ranking 階段，我們使用假特徵 (false feature)的觀念，來排序出(rank)特徵的重要度，然後挑選出排名後重要性較低的特徵，進行SFS 演算法。這樣的作法，一方面可以克服original SFS 演算法可能遭遇到的問題，另一方面還能夠擷取出更關鍵性的子特徵集合。
關鍵字	特徵選取; 特徵排序; 假特徵; 機器學習; k最近鄰居法; feature selection; feature ranking; false feature; machine Learning; k-NN
語言	zh_TW
相關連結	機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/76915 ) 機構典藏連結