Using zipf frequencies as a representativeness measure in statistical active learning of natural language

Çobanoğlu, Onur
Active learning has proven to be a successful strategy in quick development of corpora to be used in statistical induction of natural language. A vast majority of studies in this field has concentrated on finding and testing various informativeness measures for samples; however, representativeness measures for samples have not been thoroughly studied. In this thesis, we introduce a novel representativeness measure which is, being based on Zipf's law, model-independent and validated both theoretically and empirically. Experiments conducted on WSJ corpus with a wide-coverage parser show that our representativeness measure leads to better performance than previously introduced representativeness measures when used with most of the known informativeness measures.


Natural language query processing in ontology based multimedia databases
Aygül, Filiz Alaca; Çiçekli, Fehime Nihan; Department of Computer Engineering (2010)
In this thesis a natural language query interface is developed for semantic and spatio-temporal querying of MPEG-7 based domain ontologies. The underlying ontology is created by attaching domain ontologies to the core Rhizomik MPEG-7 ontology. The user can pose concept, complex concept (objects connected with an “AND” or “OR” connector), spatial (left, right . . . ), temporal (before, after, at least 10 minutes before, 5 minutes after . . . ), object trajectory and directional trajectory (east, west, southe...
Improvement of corpus-based semantic word similarity using vector space model
Esin, Yunus Emre; Alpaslan, Ferda Nur; Department of Computer Engineering (2009)
This study presents a new approach for finding semantically similar words from corpora using window based context methods. Previous studies mainly concentrate on either finding new combination of distance-weight measurement methods or proposing new context methods. The main di fference of this new approach is that this study reprocesses the outputs of the existing methods to update the representation of related word vectors used for measuring semantic distance between words, to improve the results further. ...
Using semantic web services for data integration in banking domain
Okat, Çağlar; Doğru, Ali Hikmet; Department of Computer Engineering (2010)
A semantic model oriented transformation mechanism is developed for the centralization of intra-enterprise data integration. Such a mechanism is especially crucial in the banking domain which is selected in this study. A new domain ontology is constructed to provide basis for annotations. A bottom-up approach is preferred for semantic annotations to utilize existing web service definitions. Transformations between syntactic web service XML responses and semantic model concepts are defined in transformation ...
A pattern classification approach for boosting with genetic algorithms
Yalabık, Ismet; Yarman Vural, Fatoş Tunay; Üçoluk, Göktürk; Şehitoğlu, Onur Tolga (2007-11-09)
Ensemble learning is a multiple-classifier machine learning approach which produces collections and ensembles statistical classifiers to build up more accurate classifier than the individual classifiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this study, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed system finds an elegant way of boosting a bunch of classifiers successively ...
Using Corpora For Language Research
Say, Bilge(2010)
Suitable for graduate students interested in doing theoretical or applied (computational, ELT, etc.) linguistic research using corpora.The study of language using corpora. Usage of corpora within linguistics and cognitive science. Definition and varieties of corpora. Building a corpus: sampling, representativeness, encoding and annotation. Characteristics of major available corpora. Necessary statistics to interpret corpus data. Using corpora: corpora in psycholinguistics, corpora and syntax, semantics, and...
Citation Formats
O. Çobanoğlu, “Using zipf frequencies as a representativeness measure in statistical active learning of natural language,” M.S. - Master of Science, Middle East Technical University, 2008.