Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
On The Use Of Large Language Model For Virtual Screening
Download
Thesis_İlker.pdf
Date
2023-9-11
Author
Sığırcı, Ali İlker
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
94
views
39
downloads
Cite This
Due to the abundance of drug candidates, conducting in-lab experiments to find an effective compound for a given target is a costly and time-consuming task in drug discovery. This thesis aims to reduce the number of drug candidates during early drug discovery by clustering the compounds. ChemBERTa, a Bidirectional Encoder Representation from Transformers (BERT) model, is employed to extract the descriptors for a compound. The compounds are clustered with respect to the learned features, and several clustering algorithms, including the k-means clustering algorithm and the Butina algorithm, are used. Finally, obtained clusters are evaluated by measures such as the Silhouette Score and Homogeneity Score. Our empirical findings show that using learned descriptors of ChemBERTa produces results that are comparable with traditional and graph-based models, as shown by metrics of cluster accuracy and computing runtime.
Subject Keywords
Drug-target interaction
,
Compound descriptors
,
Representation learning
,
Natural language processing
,
Clustering
URI
https://hdl.handle.net/11511/105319
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. İ. Sığırcı, “On The Use Of Large Language Model For Virtual Screening,” M.S. - Master of Science, Middle East Technical University, 2023.