TY - JOUR
T1 - Gene expression study of breast cancer using Welch Satterthwaite t-test, Kaplan-Meier estimator plot and Huber loss robust regression model
AU - Karim, Sajjad
AU - Iqbal, Md Shahid
AU - Ahmad, Nesar
AU - Ansari, Md Shahid
AU - Mirza, Zeenat
AU - Merdad, Adnan
AU - Jastaniah, Saddig D.
AU - Kumar, Sudhir
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2023/1
Y1 - 2023/1
N2 - Objective: Breast Cancer (BC) is one of the deadliest diseases in women, causing thousands of deaths annually despite the advent of high-throughput genomic platforms in the recent past. Microarray-based gene expression profiling with different statistical methods have been extensively used to understand the disease at the molecular level. We plan to apply Welch Satterthwaite t-test, Kaplan-Meier estimator plot and Huber Loss robust regression model on microarray data to improve the analysis and find biomarkers for future diagnosis, prognosis, and treatment. Methods: We retrieved microarray data (GSE10810 dataset) of 31 breast tumor samples and 27 normal breast samples from Gene Expression Omnibus (GEO, NCBI). Welch Satterthwaite t-test was applied to identify the most statistically significant genes, Huber loss robust regression model was applied to investigate the existing mathematical relations between tumor and control variables, and Kaplan-Meier Plotter was used to confirm their association with overall metastatic relapse-free survival of BC patients. Results: We identified 1837 differentially expressed genes, including 638 overexpressed (COL11A1, KIAA0101, S100P, GJB2, TOP2A, LINC01614, RRM2, INHBA, C15orf48 and CKS2) and 1199 under expressed (LEP, ADIPOQ, PLIN1, PCK1, PCOLCE2, ADH1B, LYVE1, FABP4, ABCA8, and CHRDL1) genes passing the threshold (fold change ± 2 and p value < 0.001). KM analysis revealed 12 out of 20 DEGs (log rank p value < 0.05) as potential prognostic and therapeutic biomarkers. Conclusion: Huber loss robust regression model was found to be one of the best performing algorithms for the mathematical relationship between the control and breast tumor samples with co-relation coefficient of 0.4398 and mean absolute error of 1.069 ± 0.020. In conclusion, with high mathematical confidence, we detected DEGs have high potential to be BC biomarkers using Welch t-test and Kaplan-Meier plot having minimum underlying assumptions.
AB - Objective: Breast Cancer (BC) is one of the deadliest diseases in women, causing thousands of deaths annually despite the advent of high-throughput genomic platforms in the recent past. Microarray-based gene expression profiling with different statistical methods have been extensively used to understand the disease at the molecular level. We plan to apply Welch Satterthwaite t-test, Kaplan-Meier estimator plot and Huber Loss robust regression model on microarray data to improve the analysis and find biomarkers for future diagnosis, prognosis, and treatment. Methods: We retrieved microarray data (GSE10810 dataset) of 31 breast tumor samples and 27 normal breast samples from Gene Expression Omnibus (GEO, NCBI). Welch Satterthwaite t-test was applied to identify the most statistically significant genes, Huber loss robust regression model was applied to investigate the existing mathematical relations between tumor and control variables, and Kaplan-Meier Plotter was used to confirm their association with overall metastatic relapse-free survival of BC patients. Results: We identified 1837 differentially expressed genes, including 638 overexpressed (COL11A1, KIAA0101, S100P, GJB2, TOP2A, LINC01614, RRM2, INHBA, C15orf48 and CKS2) and 1199 under expressed (LEP, ADIPOQ, PLIN1, PCK1, PCOLCE2, ADH1B, LYVE1, FABP4, ABCA8, and CHRDL1) genes passing the threshold (fold change ± 2 and p value < 0.001). KM analysis revealed 12 out of 20 DEGs (log rank p value < 0.05) as potential prognostic and therapeutic biomarkers. Conclusion: Huber loss robust regression model was found to be one of the best performing algorithms for the mathematical relationship between the control and breast tumor samples with co-relation coefficient of 0.4398 and mean absolute error of 1.069 ± 0.020. In conclusion, with high mathematical confidence, we detected DEGs have high potential to be BC biomarkers using Welch t-test and Kaplan-Meier plot having minimum underlying assumptions.
KW - Breast cancer
KW - Gene expression
KW - Huber loss robust regression
KW - Kaplan-Meier plot
KW - Microarray
KW - Welch Satterthwaite t-test
UR - http://www.scopus.com/inward/record.url?scp=85145550356&partnerID=8YFLogxK
U2 - 10.1016/j.jksus.2022.102447
DO - 10.1016/j.jksus.2022.102447
M3 - Article
AN - SCOPUS:85145550356
SN - 1018-3647
VL - 35
JO - Journal of King Saud University - Science
JF - Journal of King Saud University - Science
IS - 1
M1 - 102447
ER -