Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Turkish legal NLP: a comprehensive AI framework for legal document understanding, summarization, and retrieval-augmented generation
Download
index.pdf
stat - m a erkan.pdf
Date
2025-8-26
Author
Erkan, Mehmet Ali
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
3189
views
0
downloads
Cite This
Natural Language Processing (NLP) has been widely used across various domains; however, legal texts especially particularly in Turkish remain relatively underexplored. The complexity and richness of the Turkish legal language make it a valuable yet challenging source for artificial intelligence (AI) systems. This thesis presents an end-to-end AI-based framework designed to understand, summarize, and evaluate Turkish legal texts. A large-scale dataset was compiled, consisting of 23,035 judicial decisions and 9,277 legislative and regulatory documents collected from Yargıtay, Danıştay, the Constitutional Court, UYAP, legislative bulletins, and ministry decisions. These texts are cleaned, normalized, and structured to create benchmark datasets for legal NLP in Turkish. A detailed tokenization analysis was conducted, highlighting the trade-offs between efficiency and semantic representation across character-level, word-based, and legal-aware approaches. Embedding methods were systematically compared, and findings indicated that no single technique universally dominates; instead, their effectiveness varies depending on the legal text category and downstream task. Additionally, a new summarization algorithm was created that was specifically designed for the structural features of Turkish legal language. It beat traditional baselines by being more relevant, coherent, and legally comprehensive. Finally, a Retrieval- Augmented Generation (RAG) system was built by combining dense retrieval and generation models that work best in the legal field. This system showed that it could help lawyers and citizens with research, making decisions, and accessing legal information by being very accurate and reliable when it came to facts and citations.
Subject Keywords
Turkish law
,
Legal NLP
,
Text summarization
,
Tokenization
,
Embedding methods
,
Retrieval-augmented generation
,
Legal AI tools
,
Turkish language processing
URI
https://hdl.handle.net/11511/115579
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. A. Erkan, “Turkish legal NLP: a comprehensive AI framework for legal document understanding, summarization, and retrieval-augmented generation,” M.S. - Master of Science, Middle East Technical University, 2025.