Using zipf frequencies as a representativeness measure in statistical active learning of natural language

Download
2008
Çobanoğlu, Onur
Active learning has proven to be a successful strategy in quick development of corpora to be used in statistical induction of natural language. A vast majority of studies in this field has concentrated on finding and testing various informativeness measures for samples; however, representativeness measures for samples have not been thoroughly studied. In this thesis, we introduce a novel representativeness measure which is, being based on Zipf's law, model-independent and validated both theoretically and empirically. Experiments conducted on WSJ corpus with a wide-coverage parser show that our representativeness measure leads to better performance than previously introduced representativeness measures when used with most of the known informativeness measures.

Suggestions

A pattern classification approach for boosting with genetic algorithms
Yalabık, Ismet; Yarman Vural, Fatoş Tunay; Üçoluk, Göktürk; Şehitoğlu, Onur Tolga (2007-11-09)
Ensemble learning is a multiple-classifier machine learning approach which produces collections and ensembles statistical classifiers to build up more accurate classifier than the individual classifiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this study, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed system finds an elegant way of boosting a bunch of classifiers successively ...
A new contribution to nonlinear robust regression and classification with mars and its applications to data mining for quality control in manufacturing
Yerlikaya, Fatma; Weber, Gerhard Wilhelm; Department of Scientific Computing (2008)
Multivariate adaptive regression spline (MARS) denotes a modern methodology from statistical learning which is very important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. MARS is very useful for high dimensional problems and shows a great promise for fitting nonlinear multivariate functions. MARS technique does not impose any particular class of relationship between the predictor variables and outcome variable of interest....
Using Corpora For Language Research
Say, Bilge(2010)
Suitable for graduate students interested in doing theoretical or applied (computational, ELT, etc.) linguistic research using corpora.The study of language using corpora. Usage of corpora within linguistics and cognitive science. Definition and varieties of corpora. Building a corpus: sampling, representativeness, encoding and annotation. Characteristics of major available corpora. Necessary statistics to interpret corpus data. Using corpora: corpora in psycholinguistics, corpora and syntax, semantics, and...
Facilitating Contextual Vocabulary Learning in a Mobile-Supported Situated Learning Environment
Bilgin, Cigdem Uz; Tokel, Saniye Tuğba (2019-07-01)
The aim of this study was to investigate how the contextual vocabulary exploration processes of English as a Foreign Language learners can be facilitated in a mobile-supported situated learning environment. A Science and Technology Museum populated with interactive science experiments was chosen as the situated learning context. A mobile application was designed to support learners in completing the authentic tasks and facilitate learners in contextual vocabulary learning. During a 5-week period, 25 univers...
Ontology learning and question answering (qa) systems
Başkurt, Meltem; Alpaslan, Ferda Nur; Department of Computer Engineering (2010)
Ontology Learning requires a deep specialization on Semantic Web, Knowledge Representation, Search Engines, Inductive Learning, Natural Language Processing, Information Storage, Extraction and Retrieval. Huge amount of domain specific, unstructured on-line data needs to be expressed in machine understandable and semantically searchable format. Currently users are often forced to search manually in the results returned by the keyword-based search services. They also want to use their native languages to expr...
Citation Formats
O. Çobanoğlu, “Using zipf frequencies as a representativeness measure in statistical active learning of natural language,” M.S. - Master of Science, Middle East Technical University, 2008.