Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Efficient name disambiguation for large-scale databases
Date
2006-01-01
Author
Huang, Jian
Ertekin Bolelli, Şeyda
Giles, C. Lee
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
110
views
0
downloads
Cite This
Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selection support vector machine algorithm (LASVM), yielding a simpler model, lower test errors and faster prediction time than a standard SVM. We prove that by recasting transitivity as density reachability in DBSCAN, transitivity is guaranteed for core points. For evaluation, we manually annotated 3,355 papers yielding 490 authors and achieved 90.6% pairwise-F1. For scalability, authors in the entire CiteSeer dataset, over 700,000 papers, were readily disambiguated.
URI
https://hdl.handle.net/11511/55057
Journal
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Efficient Name Disambiguation for Large Scale Datasets.
Huang, Jian; Ertekin Bolelli, Şeyda; Giles, C Lee (2006-09-18)
Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selection supp...
Incremental clustering with vector expansion for online event detection in microblogs
Ozdikis, Ozer; Karagöz, Pınar; Oğuztüzün, Mehmet Halit S. (2017-11-04)
Identifying similarities in microblog posts for event detection poses challenges due to short texts with idiosyncratic spellings, irregular writing styles, abbreviations and synonyms. In order to overcome these challenges, we present an enhancement to the incremental clustering techniques by detecting similar terms in microblog posts in a temporal context. We devise an unsupervised method to measure the similarities online using co-occurrence-based techniques and use them in a vector expansion process. The ...
Employing Named Entities for Semantic Retrieval of News Videos in Turkish
Kucuk, Dilek; Yazıcı, Adnan (2009-09-16)
Named entities are known to be important means for semantic annotation of news texts. Considerable work has been carried out for semantic indexing of both textual news and news videos especially in English through the employment of named entities extracted from textual news or transcriptions of the news videos. In this paper, we present our semantic retrieval architecture for news videos in Turkish based on prior semantic annotation of the videos with the corresponding named entities in the news transcripti...
Selective word encoding for effective text representation
Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...
Person name recognition in turkish financial texts by using local grammar approach
Bayraktar, Özkan; Taşkaya Temizel, Tuğba; Department of Information Systems (2007)
Named entity recognition (NER) is the task of identifying the named entities (NEs) in the texts and classifying them into semantic categories such as person, organization, and place names and time, date, monetary, and percent expressions. NER has two principal aims: identification of NEs and classification of them into semantic categories. The local grammar (LG) approach has recently been shown to be superior to other NER techniques such as the probabilistic approach, the symbolic approach, and the hybrid a...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
J. Huang, Ş. Ertekin Bolelli, and C. L. Giles, “Efficient name disambiguation for large-scale databases,”
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS
, pp. 536–544, 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55057.