Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Enhancing Address Data Integrity using Transformer-Based Language Models Dönüştürücü Tabanlı Dil Modelleri Kullanarak Adres Veri Bütünlüğünün Geliştirilmesi
Date
2024-01-01
Author
Kürklü, Ömer Faruk
Akagündüz, Erdem
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
44
views
0
downloads
Cite This
Address data integrity is a critical aspect in numerous applications, yet it is often plagued with inaccuracies and inconsistencies, particularly in non-standardized formats. This study explores a novel application of transformer-based language models, traditionally utilized in language translation tasks, for the standardization and correction of Turkish address data. Leveraging the capabilities of Mixtral-8x7B, a state-of-the-art large language model, this research introduces a unique, handcrafted dataset of Turkish addresses. This dataset, derived from the National Address Dataset and enriched through ChatGPT-4 to simulate human-like input errors.This dataset was later used in fine-tuning both TowerInstruct and T5 models, transforming them into tools capable of converting faulty, error-laden address lines into standardized, structured, and corrected formats.
Subject Keywords
Address Standardization
,
Fine-Tuning
,
Synthetic Data
,
Transformers
,
Turkish Address Dataset
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85200859936&origin=inward
https://hdl.handle.net/11511/110721
DOI
https://doi.org/10.1109/siu61531.2024.10601149
Conference Name
32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024
Collections
Graduate School of Informatics, Conference / Seminar
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Ö. F. Kürklü and E. Akagündüz, “Enhancing Address Data Integrity using Transformer-Based Language Models Dönüştürücü Tabanlı Dil Modelleri Kullanarak Adres Veri Bütünlüğünün Geliştirilmesi,” presented at the 32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Türkiye, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85200859936&origin=inward.