Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
NEXT-GENERATION CELL TYPE ANNOTATION: INTEGRATING NLP AND ML TECHNIQUES FOR ENHANCED SCRNA CLASSIFICATION
Download
10671831.pdf
Date
2024-9-4
Author
Tandoğan, Orcun Sami
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
186
views
238
downloads
Cite This
Implementing machine learning in molecular biology research is essential for efficiently exploring the biomolecular cosmos. Our research aims to contribute to biotechnology by developing a methodology that optimizes automated cell-type annotation in single-cell RNA sequencing (scRNA-seq) data. In our thesis, we created a novel approach that combines natural language processing (NLP) and machine learning methods. In the first part of our methodology, we use tokenizers from advanced language models such as BERT, GPT2, and GPT3 to create text embeddings of gene symbols. We then reduce data dimensionality using the encoder parts of autoencoders. We combine this data with gene expression data to produce prediction models using machine learning methods. We use the PBMC dataset from the Human Cell Atlas to evaluate our method. Our results show that our methodology significantly improves cell type annotation accuracy compared to standard approaches. This study potentially advances our understanding of cellular diversity and function by providing a new computational tool for biotechnology
Subject Keywords
Single-cell RNA sequencing
,
Autoencoders
,
Machine Learning
,
Natural Language Processing
,
Cell Type Annotation
URI
https://hdl.handle.net/11511/111316
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
O. S. Tandoğan, “NEXT-GENERATION CELL TYPE ANNOTATION: INTEGRATING NLP AND ML TECHNIQUES FOR ENHANCED SCRNA CLASSIFICATION,” M.S. - Master of Science, Middle East Technical University, 2024.