Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Automatic Identification and Classification of Web Tables Using Machine Learning
Download
Sameh Algharabli - Automatic Identification and Classification of Web Tables Using Machine Learning.pdf
Date
2024-9-05
Author
F S ALGHARABLI, SAMEH
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
22
views
0
downloads
Cite This
Tables are considered one of the most important and common ways of presenting data, especially high-density and complex data. However, the wide variety of web table formats and styles complicates the task of providing clear presentations for all users, especially those with cognitive or visual disabilities. This thesis focuses on the automatic identification and classification of web tables using Machine Learning and Deep Learning techniques. We address the limitations of existing methods that rely solely on HTML structure features for the classification of web tables. In this thesis, we explore the challenge of web table classification based on their header locations, using both rendered features and image-based approaches. By using the features and images of tables after they have been rendered by the browser, we capture the true essence of web tables as experienced by users, thereby improving the accuracy and robustness of classification models. This thesis combines an available dataset with newly collected and manually-labeled data, totaling 5,437 tables. In this research, different models using Machine Learning and Deep Learning algorithms are developed to automatically identify and classify different types of web tables. Strong results are demonstrated in the conducted experiments, with the best performances being achieved by Random Forest with rendered features (F1-Score = 0.92) and the custom-built Convolutional Neural Network model with images (F1-Score = 0.93). The primary contribution of this thesis is the automated processing and understanding of types of web tables, which improves existing automated web table mining approaches and can potentially improve accessibility for visually impaired web users.
Subject Keywords
Table Mining, Relational Tables, Machine Learning, Rendered Features, Random Forest, Deep Learning, CNN, Feature Engineering
,
Tablo Madenciliği, İlişkisel Tablolar, Makine Öğrenimi, Oluşturulan Özellikler, Rassal Orman (Random Forest), Derin Öğrenme, Evrişimsel Sinir Ağları (CNN), Özellik Mühendisliği
URI
https://hdl.handle.net/11511/111601
Collections
Northern Cyprus Campus, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. F S ALGHARABLI, “Automatic Identification and Classification of Web Tables Using Machine Learning,” M.S. - Master of Science, Middle East Technical University, 2024.