Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
學年 110
學期 2
出版(發表)日期 2022-02-07
作品名稱 Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
著者 Ai-Ru Hsieh; Yi-Mei Aimee Li
著錄名稱、卷期、頁數 Frontiers in Genetics 13, 822117
摘要 With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance.
關鍵字 mManhattan plot;imbalanced data;genome-wide association analyses;biobank;deep learning;image identification
語言 en_US
ISSN 1664-8021
期刊性質 國外
收錄於 SCI
國別 CHE
出版型式 ,電子版

機構典藏連結 ( )