Feature Selection for Identifying Protein Disordered Regions
學年 98
學期 2
出版(發表)日期 2010-04-01
作品名稱 Feature Selection for Identifying Protein Disordered Regions
作品名稱(其他語言)
著者 Hsu, Hui-Huang; Hsieh, Cheng-Wei
單位 淡江大學資訊工程學系
出版者 Singapore: World Scientific Publishing Co. Pte. Ltd.
著錄名稱、卷期、頁數 Biomedical Engineering: Applications, Basis and Communications 22(2), pp.119-125
摘要 Determining the structure of a protein is not an easy task, which usually involved a time-consuming and costly process in the web lab. Using computational methods to predict a protein's tertiary structure from its primary structure (the amino acid sequence) is desirable. Disordered regions are segments of a protein that do not have a fixed conformation, which makes the structure prediction harder. Also, these disordered regions are functionally important for a protein. In this research, we would like to identify such regions with a focus on selecting a proper feature set. Three feature selection methods, namely F-score, information gain (IG), and k-medoids clustering, are used for feature selection. The support vector machine (SVM) is then used for classification. The results show that the classification accuracy can be raised with a smaller feature set. The k-medoids clustering feature selection can reduce the number of features from 440 to 150 and improve the accuracy from 84.66 to 86.81% in five-fold cross validation. It also has a more stable performance than F-score and IG.
關鍵字 Disordered protein region; k-Medoids clustering; Feature selection; Proteomics
語言 en
ISSN 1016-2372 1793-7132
期刊性質 國外
收錄於 SCI EI
產學合作
通訊作者 Hsu, Hui-Huang
審稿制度
國別 SGP
公開徵稿
出版型式 紙本
相關連結

機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/59913 )

機構典藏連結