Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Efficient Name Disambiguation for Large Scale Datasets.
Date
2006-09-18
Author
Huang, Jian
Ertekin Bolelli, Şeyda
Giles, C Lee
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
63
views
0
downloads
Cite This
Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selection support vector machine algorithm (LASVM), yielding a simpler model, lower test errors and faster prediction time than a standard SVM. We prove that by recasting transitivity as density reachability in DBSCAN, transitivity is guaranteed for core points. For evaluation, we manually annotated 3,355 papers yielding 490 authors and achieved 90.6% pairwise-F1. For scalability, authors in the entire CiteSeer dataset, over 700,000 papers, were readily disambiguated.
Subject Keywords
Support Vector Machine
,
Digital Library
,
Support Vector Machine Model
,
Core Point
,
Standard Support Vector Machine
URI
https://hdl.handle.net/11511/69527
DOI
https://doi.org/10.1007/11871637_53
Collections
Department of Computer Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
Efficient name disambiguation for large-scale databases
Huang, Jian; Ertekin Bolelli, Şeyda; Giles, C. Lee (2006-01-01)
Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selection supp...
Incremental clustering with vector expansion for online event detection in microblogs
Ozdikis, Ozer; Karagöz, Pınar; Oğuztüzün, Mehmet Halit S. (2017-11-04)
Identifying similarities in microblog posts for event detection poses challenges due to short texts with idiosyncratic spellings, irregular writing styles, abbreviations and synonyms. In order to overcome these challenges, we present an enhancement to the incremental clustering techniques by detecting similar terms in microblog posts in a temporal context. We devise an unsupervised method to measure the similarities online using co-occurrence-based techniques and use them in a vector expansion process. The ...
Employing Named Entities for Semantic Retrieval of News Videos in Turkish
Kucuk, Dilek; Yazıcı, Adnan (2009-09-16)
Named entities are known to be important means for semantic annotation of news texts. Considerable work has been carried out for semantic indexing of both textual news and news videos especially in English through the employment of named entities extracted from textual news or transcriptions of the news videos. In this paper, we present our semantic retrieval architecture for news videos in Turkish based on prior semantic annotation of the videos with the corresponding named entities in the news transcripti...
Predicting the effect of hydrophobicity surface on binding affinity of PCP-like compounds using machine learning methods
Yoldaş, Mine; Alpaslan, Ferda Nur; Büyükbingöl, Erdem; Department of Computer Engineering (2011)
This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which aff ects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained fro...
Selective word encoding for effective text representation
Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
J. Huang, Ş. Ertekin Bolelli, and C. L. Giles, “Efficient Name Disambiguation for Large Scale Datasets.,” 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/69527.