A Multi-Omics and Machine Learning-Based Predictor of Drug Sensitivity in Cancer

2022-10
Ozcan, Umut Onur
Mohammadvand, Navid
Izmirli, Burakcan
Akar, Etkin
Kahraman, Deniz Cansen
Doğan, Tunca
One of the most critical requirements of personalized medicine is to have prior knowledge about a tumor’s response to different treatments. To this end, large-scale experimental projects have been carried out with the specific aim of measuring the sensitivity of cancer cells to known drugs. However, due to the high costs and long durations associated with such projects, only a small part of the pharmacogenomic space could be screened. Lately, computational methods are developed and used for predicting drug responses/sensitivities of target biomolecules or cells to guide experimental work. Due to its advantages (e.g., rapid operational processes and low costs), this approach has gained popularity and captured the interest of the pharmaceutical sector. However, computational predictive methods proposed to date have not been able to become part of real-life biomedical applications due to, first, the attributes of cells, used as input to the predictive system, were not adequately addressed, and second, state-of-the-art artificial learning techniques, which have the potential for properly handling the complexity of cells, could not be used effectively. In this project, we propose DeepResponse, an artificial learning based computational system that predicts specific inhibitor small molecule drugs (and drug candidate compounds) for cancer cell lines. In the proposed method, pharmacogenomic approach is utilized, where the features of both cells and drugs/compounds are given to the system at the input level. The model learns the inherent relationship between these two entities and outputs the real valued does response (i.e., IC50 values) of the cell given to the drug/compound of interest. In terms of input features, multi-omics data are used to represent cells in the system, which include gene expression, mutation, copy number variation (CNV) and DNA methylation profiles. On the drug/compound side, structural features are utilized using molecular fingerprints (ECFP4) and graph representations (fundamentally expressing atoms and bonds in the molecule). In terms of the algorithms employed, (i) random forest, (ii) pairwise input multilayered perceptron (MLP), and (iii) a hybrid deep convolutional and graph neural network, are utilized in 3 completely independent prediction models. The results of 3 independent models are consolidated to produce finalized drug sensitivity/response predictions. DeepResponse models were trained and tested on cancer cell line-based experimental profiling and drug screening datasets produced by large-scale projects/databases such as Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), and NCI-60. Both regression- (e.g., mean squared error, R^2, Spearman and Pearson correlation) and classification-based (e.g., prediction, recall, accuracy, F1-score, MCC) metrics were used to measure the performances of models. The hyper-parameters of models were optimized via cross-validation, and the final performances were evaluated by independent tests and compared with the state-of-the-art. With the aim of investigating the system, three different modes of analyses were employed, (i) a within-domain analysis (the main/standard test) in which the system is trained and tested on the data of a single project (e.g., GDSC) using different train-test data splitting strategies, (ii) a cross-domain analysis in which the system is trained on GDSC data and tested on CCLE and NCI-60 data to measure DeepResponse’s ability to generalize information, and (iii) and ablation study where only one type of input feature is utilized in a model (i.e., either expression, mutation, CNV or methylation) to assess the contribution of each component. Finalized DeepResponse models were then run for all possible cell line - drug pairs (i.e., approximately 1,000 cell lines and 10,000 drugs) to produce IC50-based quantitative drug response estimates for pairs without experimental information. At the final stage of the project, we conducted a use-case analysis in which we validated selected predictions. For this, we investigated the repurposing of drugs against hepatocellular carcinoma, which is reported to be the second deadliest cancer in the world. Among multiple inhibitor predictions given for HCC cell lines, such as Huh7, Hep3B, SNU 387/423/475, Eprinomectin drug (an approved avermectin currently used as a veterinary topical endectocide) has been selected for experimentation due to its high predicted activity and lack of any previous studies regarding its repurposing against HCC. SRB, RT-CES, cell cycle and Western blot analyses pointed out the inhibitory potential of Eprinomectin, which was comparable (or better in some cases) to the approved HCC drug Sorafenib. Further analysis is required to better assess the effects of this drug on both cancerous and healthy human cells. Both in silico evaluation and in vitro validation results indicated that DeepResponse successfully predicts drug sensitivity of cancer cells, and especially the multi-omics aspect benefited the learning process and yielded better performance compared to the single-omic-based state-of-the-art. DeepResponse can be used for early-stage discovery of new drug candidates, and for repurposing the existing ones, against resistant tumors. We are currently constructing an open access programmatic tool (with a graphical interface) for DeepResponse, so that life science researchers will be able to use the system to obtain drug sensitivity predictions for the samples of their interest.

Suggestions

A microfluidic system for dielectrophoretic characterization of cancer cells
Sel, Kaan; Külah, Haluk; Department of Electrical and Electronics Engineering (2018)
Dielectrophoresis (DEP) is a promising cell manipulation approach for early diagnosis of cancer, which significantly increases chances of successful treatment. Compared to other cell manipulation techniques that rely on surface antigens, DEP systems enable label-free, cost-effective, simply-implementable cell characterization and separation. However, separation efficiency of the DEP based systems is limited and still far from meeting the medical requirements for early cancer detection. In order to improve t...
Alternative Polyadenylation Patterns for Novel Gene Discovery and Classification in Cancer
BEGIK, Oguzhan; ÖYKEN, MERVE; ALICAN, Tuna Cinkilli; Can, Tolga; Erson Bensan, Ayşe Elif (2017-07-01)
Certain aspects of diagnosis, prognosis, and treatment of cancer patients are still important challenges to be addressed. Therefore, we propose a pipeline to uncover patterns of alternative polyadenylation (APA), a hidden complexity in cancer transcriptomes, to further accelerate efforts to discover novel cancer genes and pathways. Here, we analyzed expression data for 1045 cancer patients and found a significant shift in usage of poly(A) signals in common tumor types (breast, colon, lung, prostate, gastric...
Alternative Polyadenylation patterns for novel gene discovery and classification in cancer
Beğik, Oğuzhan; Öyken, Merve; Can, Tolga; Erson Bensan, Ayşe Elif (2017-06-03)
Certain aspects of diagnosis, prognosis and treatment of cancer patients are still important challenges to be addressed. Therefore, we propose a pipeline to uncover patterns of alternative polyadenylation (APA), a hidden complexity in cancer transcriptomes, to further accelerate efforts to discover novel cancer genes and pathways. Here, we analyzed expression data for 1,045 cancer patients and found a significant shift in usage of poly(A) signals in cancers. Using machine-learning techniques, we further def...
A medical image processing and analysis framework
Çevik, Alper; Eyüboğlu, Behçet Murat; Oğuz, Kader Karlı; Department of Biomedical Engineering (2011)
Medical image analysis is one of the most critical studies in field of medicine, since results gained by the analysis guide radiologists for diagnosis, treatment planning, and verification of administered treatment. Therefore, accuracy in analysis of medical images is at least as important as accuracy in data acquisition processes. Medical images require sequential application of several image post-processing techniques in order to be used for quantification and analysis of intended features. Main objective...
A test for detecting etiologic heterogeneity in epidemiological studies
Karagulle, S.; Kalaylıoğlu Akyıldız, Zeynep Işıl (2016-02-17)
Current statistical methods for analyzing epidemiological data with disease subtype information allow us to acquire knowledge not only for risk factor-disease subtype association but also, on a more profound account, heterogeneity in these associations by multiple disease characteristics (so-called etiologic heterogeneity of the disease). Current interest, particularly in cancer epidemiology, lies in obtaining a valid p-value for testing the hypothesis whether a particular cancer is etiologically heterogene...
Citation Formats
U. O. Ozcan, N. Mohammadvand, B. Izmirli, E. Akar, D. C. Kahraman, and T. Doğan, “A Multi-Omics and Machine Learning-Based Predictor of Drug Sensitivity in Cancer,” Erdemli, Mersin, TÜRKİYE, 2022, p. 3051, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr.