Abstract
Gene expression profiles are widely used for identifying phenotype-specific biomarkers in clinical cancer research. By examining important genes expressed in different phenotypes, patients can be classified into different treatment groups. Microarray and RNAseq are the two leading technologies to measure gene expression data. However, due to the heterogeneity of the two platforms, their selected genes are different. In this project, we systematically compared the breast cancer subtype classification accuracies from the selected genes by four popular multiclass feature selection algorithms and discussed the strengths and weakness of selected genes across different platforms and cohorts. Our results showed that the classification of selected genes performs best within the same platform across different cohorts. It suggested that merging the dataset belonging to the same platform will increase the statistical power and improve the prediction accuracy of the selected gene for multiclass classification analysis.
Original language | English |
---|---|
Pages (from-to) | 128-142 |
Number of pages | 15 |
Journal | International Journal of Computational Biology and Drug Design |
Volume | 12 |
Issue number | 2 |
DOIs | |
State | Published - 2019 |
Externally published | Yes |
Keywords
- Breast cancer
- Cancer subtypes
- Feature selection
- Functional analysis
- Integration analysis
- Machine learning
- Systems biology