Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Using zipf frequencies as a representativeness measure in statistical active learning of natural language
Download
index.pdf
Date
2008
Author
Çobanoğlu, Onur
Metadata
Show full item record
Item Usage Stats
219
views
63
downloads
Cite This
Active learning has proven to be a successful strategy in quick development of corpora to be used in statistical induction of natural language. A vast majority of studies in this field has concentrated on finding and testing various informativeness measures for samples; however, representativeness measures for samples have not been thoroughly studied. In this thesis, we introduce a novel representativeness measure which is, being based on Zipf's law, model-independent and validated both theoretically and empirically. Experiments conducted on WSJ corpus with a wide-coverage parser show that our representativeness measure leads to better performance than previously introduced representativeness measures when used with most of the known informativeness measures.
Subject Keywords
Computer enginnering.
,
Active learning.
URI
http://etd.lib.metu.edu.tr/upload/3/12609684/index.pdf
https://hdl.handle.net/11511/17685
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
A new contribution to nonlinear robust regression and classification with mars and its applications to data mining for quality control in manufacturing
Yerlikaya, Fatma; Weber, Gerhard Wilhelm; Department of Scientific Computing (2008)
Multivariate adaptive regression spline (MARS) denotes a modern methodology from statistical learning which is very important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. MARS is very useful for high dimensional problems and shows a great promise for fitting nonlinear multivariate functions. MARS technique does not impose any particular class of relationship between the predictor variables and outcome variable of interest....
Natural language query processing in ontology based multimedia databases
Aygül, Filiz Alaca; Çiçekli, Fehime Nihan; Department of Computer Engineering (2010)
In this thesis a natural language query interface is developed for semantic and spatio-temporal querying of MPEG-7 based domain ontologies. The underlying ontology is created by attaching domain ontologies to the core Rhizomik MPEG-7 ontology. The user can pose concept, complex concept (objects connected with an “AND” or “OR” connector), spatial (left, right . . . ), temporal (before, after, at least 10 minutes before, 5 minutes after . . . ), object trajectory and directional trajectory (east, west, southe...
Improvement of corpus-based semantic word similarity using vector space model
Esin, Yunus Emre; Alpaslan, Ferda Nur; Department of Computer Engineering (2009)
This study presents a new approach for finding semantically similar words from corpora using window based context methods. Previous studies mainly concentrate on either finding new combination of distance-weight measurement methods or proposing new context methods. The main di fference of this new approach is that this study reprocesses the outputs of the existing methods to update the representation of related word vectors used for measuring semantic distance between words, to improve the results further. ...
Using semantic web services for data integration in banking domain
Okat, Çağlar; Doğru, Ali Hikmet; Department of Computer Engineering (2010)
A semantic model oriented transformation mechanism is developed for the centralization of intra-enterprise data integration. Such a mechanism is especially crucial in the banking domain which is selected in this study. A new domain ontology is constructed to provide basis for annotations. A bottom-up approach is preferred for semantic annotations to utilize existing web service definitions. Transformations between syntactic web service XML responses and semantic model concepts are defined in transformation ...
A pattern classification approach for boosting with genetic algorithms
Yalabık, Ismet; Yarman Vural, Fatoş Tunay; Üçoluk, Göktürk; Şehitoğlu, Onur Tolga (2007-11-09)
Ensemble learning is a multiple-classifier machine learning approach which produces collections and ensembles statistical classifiers to build up more accurate classifier than the individual classifiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this study, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed system finds an elegant way of boosting a bunch of classifiers successively ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
O. Çobanoğlu, “Using zipf frequencies as a representativeness measure in statistical active learning of natural language,” M.S. - Master of Science, Middle East Technical University, 2008.