Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Using zipf frequencies as a representativeness measure in statistical active learning of natural language
Download
index.pdf
Date
2008
Author
Çobanoğlu, Onur
Metadata
Show full item record
Item Usage Stats
109
views
17
downloads
Cite This
Active learning has proven to be a successful strategy in quick development of corpora to be used in statistical induction of natural language. A vast majority of studies in this field has concentrated on finding and testing various informativeness measures for samples; however, representativeness measures for samples have not been thoroughly studied. In this thesis, we introduce a novel representativeness measure which is, being based on Zipf's law, model-independent and validated both theoretically and empirically. Experiments conducted on WSJ corpus with a wide-coverage parser show that our representativeness measure leads to better performance than previously introduced representativeness measures when used with most of the known informativeness measures.
Subject Keywords
Computer enginnering.
,
Active learning.
URI
http://etd.lib.metu.edu.tr/upload/3/12609684/index.pdf
https://hdl.handle.net/11511/17685
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
A pattern classification approach for boosting with genetic algorithms
Yalabık, Ismet; Yarman Vural, Fatoş Tunay; Üçoluk, Göktürk; Şehitoğlu, Onur Tolga (2007-11-09)
Ensemble learning is a multiple-classifier machine learning approach which produces collections and ensembles statistical classifiers to build up more accurate classifier than the individual classifiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this study, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed system finds an elegant way of boosting a bunch of classifiers successively ...
A new contribution to nonlinear robust regression and classification with mars and its applications to data mining for quality control in manufacturing
Yerlikaya, Fatma; Weber, Gerhard Wilhelm; Department of Scientific Computing (2008)
Multivariate adaptive regression spline (MARS) denotes a modern methodology from statistical learning which is very important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. MARS is very useful for high dimensional problems and shows a great promise for fitting nonlinear multivariate functions. MARS technique does not impose any particular class of relationship between the predictor variables and outcome variable of interest....
Using Corpora For Language Research
Say, Bilge(2010)
Suitable for graduate students interested in doing theoretical or applied (computational, ELT, etc.) linguistic research using corpora.The study of language using corpora. Usage of corpora within linguistics and cognitive science. Definition and varieties of corpora. Building a corpus: sampling, representativeness, encoding and annotation. Characteristics of major available corpora. Necessary statistics to interpret corpus data. Using corpora: corpora in psycholinguistics, corpora and syntax, semantics, and...
Facilitating Contextual Vocabulary Learning in a Mobile-Supported Situated Learning Environment
Bilgin, Cigdem Uz; Tokel, Saniye Tuğba (2019-07-01)
The aim of this study was to investigate how the contextual vocabulary exploration processes of English as a Foreign Language learners can be facilitated in a mobile-supported situated learning environment. A Science and Technology Museum populated with interactive science experiments was chosen as the situated learning context. A mobile application was designed to support learners in completing the authentic tasks and facilitate learners in contextual vocabulary learning. During a 5-week period, 25 univers...
Ontology learning and question answering (qa) systems
Başkurt, Meltem; Alpaslan, Ferda Nur; Department of Computer Engineering (2010)
Ontology Learning requires a deep specialization on Semantic Web, Knowledge Representation, Search Engines, Inductive Learning, Natural Language Processing, Information Storage, Extraction and Retrieval. Huge amount of domain specific, unstructured on-line data needs to be expressed in machine understandable and semantically searchable format. Currently users are often forced to search manually in the results returned by the keyword-based search services. They also want to use their native languages to expr...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
O. Çobanoğlu, “Using zipf frequencies as a representativeness measure in statistical active learning of natural language,” M.S. - Master of Science, Middle East Technical University, 2008.