TY - GEN
T1 - A supervised approach for predicting patient survival with gene expression data
AU - Devarajan, Karthik
AU - Zhou, Yan
AU - Chachra, Neeraj
AU - Ebrahimi, Nader
PY - 2010
Y1 - 2010
N2 - Rapid development in genomics in recent years has allowed the simultaneous measurement of the expression levels of thousands of genes using DNA microarrays. This has offered tremendous potential for growth in our understanding of the pathophysiology of many diseases. When microarray studies also contain information about an outcome variable such as time to an event or death, one of the goals of an investigator is to understand how the expression levels of genes (covariates) relate to the time-to-event (referred to as survival time) in the course of a disease. In this article, we examine the problem of predicting the survival probability of patients when the number of covariates exceeds the number of observations, a setting typical of microarray gene expression data. This is an ill-conditioned problem further compounded by the presence of possibly censored survival times. We propose a model that combines the partial least squares approach for dimensionality reduction with the accelerated failure time model, a widely used log-linear model for linking censored survival time to covariates. We develop parametric methods to account for censoring as well as for predicting patient survival probabilities. We illustrate the applicability of our methods using cancer microarray data and explore the biological relevance of our results using pathway analysis. Finally, we evaluate the performance of our methods using extensive simulation studies.
AB - Rapid development in genomics in recent years has allowed the simultaneous measurement of the expression levels of thousands of genes using DNA microarrays. This has offered tremendous potential for growth in our understanding of the pathophysiology of many diseases. When microarray studies also contain information about an outcome variable such as time to an event or death, one of the goals of an investigator is to understand how the expression levels of genes (covariates) relate to the time-to-event (referred to as survival time) in the course of a disease. In this article, we examine the problem of predicting the survival probability of patients when the number of covariates exceeds the number of observations, a setting typical of microarray gene expression data. This is an ill-conditioned problem further compounded by the presence of possibly censored survival times. We propose a model that combines the partial least squares approach for dimensionality reduction with the accelerated failure time model, a widely used log-linear model for linking censored survival time to covariates. We develop parametric methods to account for censoring as well as for predicting patient survival probabilities. We illustrate the applicability of our methods using cancer microarray data and explore the biological relevance of our results using pathway analysis. Finally, we evaluate the performance of our methods using extensive simulation studies.
KW - Accelerated failure time
KW - Censored survival data
KW - Gene expression
KW - High-dimensional data
KW - Partial least squares
UR - http://www.scopus.com/inward/record.url?scp=77956149315&partnerID=8YFLogxK
U2 - 10.1109/BIBE.2010.14
DO - 10.1109/BIBE.2010.14
M3 - Conference contribution
C2 - 20865131
AN - SCOPUS:77956149315
SN - 9780769540832
VL - 2010
T3 - 10th IEEE International Conference on Bioinformatics and Bioengineering 2010, BIBE 2010
SP - 26
EP - 31
BT - 10th IEEE International Conference on Bioinformatics and Bioengineering 2010, BIBE 2010
T2 - 10th IEEE International Conference on Bioinformatics and Bioengineering, BIBE-2010
Y2 - 31 May 2010 through 3 June 2010
ER -