An integrative framework for clinical diagnosis and knowledge discovery from exome sequencing data

Shojaei, Mona
Mohammadvand, Navid
Alkan, Can
Çetin Atalay, Rengül
Acar, Aybar Can
Non-silent single nucleotide genetic variants, like nonsense changes and insertion-deletion variants, that affect protein function and length substantially are prevalent and are frequently misclassified. The low sensitivity and specificity of existing variant effect predictors for nonsense and indel variations restrict their use in clinical applications. We propose the Pathogenic Mutation Prediction (PMPred) method to predict the pathogenicity of single nucleotide variations, which impair protein function by prematurely terminating a protein's elongation during its synthesis. The prediction starts by monitoring functional effects (Gene Ontology annotation changes) of the change in sequence, using an existing ensemble machine learning model (UniGOPred). This, in turn, reveals the mutations that significantly deviate functionally from the wild-type sequence. We have identified novel harmful mutations in patient data and present them as motivating case studies. We also show that our method has increased sensitivity and specificity compared to state-of-the-art, especially in single nucleotide variations that produce large functional changes in the final protein. As further validation, we have done a comparative docking study on such a variation that is misclassified by existing methods and, using the altered binding affinities, show how PMPred can correctly predict the pathogenicity when other tools miss it. PMPred is freely accessible as a web service at, and the related code is available at
Computers in Biology and Medicine
Citation Formats
M. Shojaei, N. Mohammadvand, T. DOĞAN, C. Alkan, R. Çetin Atalay, and A. C. Acar, “An integrative framework for clinical diagnosis and knowledge discovery from exome sequencing data,” Computers in Biology and Medicine, vol. 169, pp. 0–0, 2024, Accessed: 00, 2024. [Online]. Available: