Role of Big Data & Chronic Obstructive Pulmonary Disease (COPD) phenotypes and ML cluster analyses – potential topics for PhD Scholars
In-Brief:
Chronic obstructive pulmonary disease (COPD), a leading cause of death worldwide, is a heterogeneous and multisystemic condition. Growth and application of Machine Learning (ML) algorithms in Medical Research can potentially help advance this classification procedure. Scope of ML algorithms was explored to identify the heterogeneity of certain conditions. Mathematical models are being developed
Introduction:
Chronic obstructive pulmonary disease (COPD), a leading cause of death worldwide, is a heterogeneous and multisystemic condition. It includes diseases like asthma, emphysema and chronic bronchitis (Nikalaou 2020). It is marked by persistent respiratory symptoms and restricted airflow caused by airway and/or alveolar abnormalities. Significant exposure to harmful particles or fumes is usually the cause of these abnormalities (Corlateanu 2020). To understand this condition better, physicians have classified patients into phenotypes based on symptomatic features, including symptom severity and history of exacerbations. The growth and application of machine learning (ML) algorithms in Medical Research can potentially help advance this classification procedure (Nikalaou 2020). This review summarizes the use of machine learning algorithms and cluster analyses in COPD phenotypes.
Application of machine learning – Recent research :
The last decade has seen substantial growth in the use of Machine Learning in Medicine and Research. The scope of ML algorithms was explored to identify the heterogeneity of certain conditions. Mathematical models are being developed using genomic, transcriptomic, and proteomic data to predict or differentiate disease phenotypes (Tang 2020).
COPD phenotypic classification has progressed from the classic phenotypes of emphysema, chronic bronchitis, and asthma to a plethora of phenotypes that represent the disease’s heterogeneity. Over the last 10 years, new imaging modalities, high-performance systems for protein, gene, and metabolite assessment, and integrative approaches to disease classification have contributed to the identification of a variety of phenotypes (O’Brien 2020).
Disease/
Condition |
Machine Learning Algorithm used | Outcome | Reference |
Asthma | Support vector machine (SVM)
Recursive feature elimination (REF) |
RFE identified 8 and 12 predictors for the ‘Childhood asthma prediction in early life’ (CAPE) & ‘Childhood asthma prediction at preschool age’ (CAPP) models, respectively.
SVM gave best predictive performance for both models. |
Kothalawala et al.( 2021) |
Nocturnal COPD | Random forest (RF) | Demonstrated for the first time, the feasibility of COPD diagnosis from nocturnal oximetry time series for a population sample at a risk of sleep-disordered breathing.
No severe cases of misdiagnosis using this method. |
Levy et al.( 2021) |
COPD in Chinese population | Logistic regression model (LR)
Artificial neural network of multilayer perceptron (MLP)
Decision tree model (DT)
XGboost ,SVM & K-nearest neighbors classifier (KNN) |
The KNN, LR and XGboost models showed excellent overall predictive power.
The use of machine learning tools combining both clinical and SNP features was suitable for predicting the risk of COPD development. |
Xia Ma et al.( 2020) |
Table 1: Recent research on application of machine learning in COPD
Boddulari et al. conducted a Deep Learning and Machine Learning based analysis using spirometry data to identify the structural phenotypes of COPD. The study was conducted on 8980 patients and applied techniques like random forest and full convolutional network (FCN). They demonstrated the potential of machine learning approaches to identify patients for targeted therapies (Bodduluri 2020). In another study, researchers evaluated the possible clinical clusters in COPD patients at two study centres in Brazil. A total number of 301 patients were included in this study and methods like Ward and K-means were applied. They were able to identify four different clinical clusters in the COPD population (Zucchi 2020).
Fig.1: Use of machine learning algorithms in COPD
Network-based methods have also been used to study biomarkers of COPD. Sex-specific gene co-expression patterns have been discovered using correlation-based network approaches. PANDA (Passing Attributes between Networks for Data Assimilation) reported sex-specific differential targeting of several genes, with mitochondrial pathways being enriched in women (DeMeo 2021).
Big data – Role in COPD analysisbf:
The application of Big Data in the Study of heterogenic conditions is of utmost importance. Analysis of large amounts of data at once using computing techniques can help in better understanding of complex diseases like COPD. Genetics, other Omics (e.g., transcriptomics, proteomics, metabolomics, and epigenetics), and imaging are all vital sources of big data in COPD study. COPD Genetic Research has already produced a large amount of Big Data. Another important source of Big Data in COPD research is imaging, which is usually done with chest CT scans. Network science offers methods for analyzing big data (Silverman 2020). Projects like COPD Gene (19,000 lung CT scans of 10,000 people) provide unprecedented opportunities to learn from massive medical image sets (Toews 2015).
A research undertaken in England signified the importance of Big Data and Machine Learning in COPD. The researchers successfully sub-classified COPD patients into five clusters based on the demography, risk of death, comorbidity and exacerbations. They applied cluster analysis methods on large-scale electronic health record (EHR) data (Pikoula 2019).
Future scope:
The appropriate application of large medical datasets or big data and machine learning analysis can play a vital role in the improving management of COPD. The adoption of these techniques can further facilitate the classification of individuals with different responses to therapy. That can also lead to personalized therapy for patients with COPD. To conclude, ML algorithms and big data hold the potential to change the prognosis and management of COPD. However, more elaborated research projects are needed to establish the application of these tools.
References:
- Bodduluri, S., Nakhmani, A., Reinhardt, J. M., Wilson, C. G., McDonald, M. L., Rudraraju, R., Jaeger, B. C., Bhakta, N. R., Castaldi, P. J., Sciurba, F. C., Zhang, C., Bangalore, P. V., & Bhatt, S. P. (2020). Deep neural network analyses of spirometry for structural phenotyping of chronic obstructive pulmonary disease. JCI insight, 5(13), e132781.
- Corlateanu, A., Mendez, Y., Wang, Y., Garnica, R. D. J. A., Botnaru, V., & Siafakas, N. (2020). Chronic obstructive pulmonary disease and phenotypes: a state-of-the-art. Pulmonology, 26(2), 95-100.
- DeMeo, D. L. (2021). Sex and Gender Omic biomarkers in men and women with COPD: Considerations for precision medicine. Chest.
- Kim, S., Lim, M. N., Hong, Y., Han, S. S., Lee, S. J., & Kim, W. J. (2017). A cluster analysis of chronic obstructive pulmonary disease in dusty areas cohort identified three subgroups. BMC pulmonary medicine, 17(1), 209.
- Kothalawala, D. M., Murray, C., Simpson, A., Custovic, A., Tapper, W. J., Arshad, S. H., … & STELAR/UNICORN Consortium. (2021). Development of Childhood Asthma Prediction Models using Machine Learning Approaches. medRxiv.
- Levy, J., Álvarez, D., Del Campo, F., & Behar, J. A. (2021). Machine learning for nocturnal diagnosis of chronic obstructive pulmonary disease using digital oximetry biomarkers. Physiological Measurement.
- Ma, X., Wu, Y., Zhang, L. et al. Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population. J Transl Med 18, 146 (2020).
- Nikolaou, V., Massaro, S., Fakhimi, M., Stergioulas, L., & Price, D. (2020). COPD phenotypes and machine learning cluster analysis: A systematic review and future research agenda. Respiratory Medicine, 106093.
- O’Brien, E., Sciurba, F. C., & Bon, J. (2020). COPD Phenotyping. In Precision in Pulmonary, Critical Care, and Sleep Medicine (pp. 225-239). Humana, Cham.
- Pikoula, M., Quint, J. K., Nissen, F., Hemingway, H., Smeeth, L., & Denaxas, S. (2019). Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC medical informatics and decision making, 19(1), 1-14.
- Silverman, E. K. (2017). Big Data and Network Medicine in COPD. In COPD (pp. 321-332). Springer, Berlin, Heidelberg.
- Tang, H. H., Sly, P. D., Holt, P. G., Holt, K. E., & Inouye, M. (2020). Systems biology and big data in asthma and allergy: recent discoveries and emerging challenges. European Respiratory Journal, 55(1).
- Toews, M., Wachinger, C., Estepar, R. S. J., & Wells, W. M. (2015, June). A feature-based approach to big data analysis of medical images. In International Conference on Information Processing in Medical Imaging (pp. 339-350). Springer, Cham.
- Zucchi, J. W., Franco, E. A. T., Schreck, T., e Silva, M. H. C., dos Santos Migliorini, S. R., Garcia, T., … & Tanni, S. E. (2020). Different Clusters in Patients with Chronic Obstructive Pulmonary Disease (COPD): A Two-Center Study in Brazil. International Journal of Chronic Obstructive Pulmonary Disease, 15, 2847.