Supertagging with combinatory categorial grammar for dependency parsing

Download
2014
Akkuş, Burak Kerim
Combinatory Categorial Grammar (CCG) categories contain syntactic and semantic information. CCG derivation trees can be used in extracting partial dependency structures by providing the missing information in order to build complete dependency structures. Therefore, CCG categories are sometimes referred to as supertags. The amount of information encoded in supertags makes it possible to create very accurate and fast parsers as supertagging is considered ``almost parsing''. In this thesis, a maximum entropy based part of speech tagger is presented to improve the performance of CCG supertagging and another maximum entropy classifier is implemented with additional features for supertagging. Morphological features of words of an agglutinative language such as Turkish are used in order to improve the accuracy of POS tagging and supertagging processes. This indicates direct relationships between morphemes and lexical categories. The effects of using the improved supertagger are tested on dependency parsers by means of using supertags as rich parts of speech tags. Additionally, using POS taggers that assign multiple part of speech tags to the ambiguous words is suggested as another potential improvement for supertaggers.

Suggestions

Implementing the type-raising algorithm by grammar compiling
Demir, Oğuzhan; Bozşahin, Hüseyin Cem; Department of Cognitive Sciences (2019)
Type-raising is part of theory of Combinatory Categorial Grammar, by which all arguments including complements are type-raised. Generating type-raising rules in an automatic manner in the compile-time via a simple tool would make experimenting with Combinatory Categorial Grammar faster, allowing control on each run. In this study, created tool is tested with various grammars including large scale Eve database, giving results in O(N) where N is the number of verbs in the grammar.
A type system for combinatory categorial grammar
Erkan, Güneş; Bozşahin, Hüseyin Cem; Department of Computer Engineering (2003)
This thesis investigates the internal structure and the computational representation of the lexical entries in Combinatory Categorial Grammar (CCG). A restricted form of typed feature structures is proposed for representing CCG categories. This proposal is combined with a constraint-based modality system for basic categories of CCG. We present some linguistic evidence to explain why both a uni cation-based feature system and a constraint-based modality system are needed for a lexicalist framework. An implem...
Searching documents with semantically related keyphrases
Aygül, İbrahim; Çiçekli, Fehime Nihan; Çiçekli, İlyas; Department of Computer Engineering (2010)
In this thesis, we developed SemKPSearch which is a tool for searching documents by the keyphrases that are semantically related with the given query phrase. By relating the keyphrases semantically, we aim to provide users an extended search and browsing capability over a document collection and to increase the number of related results returned for a keyphrase query. Keyphrases provide a brief summary of the content of documents. They can be either author assigned or automatically extracted from the docume...
Subtree selection in kernels for graph classification
TAN, MEHMET; Polat, Faruk; Alhajj, Reda (Inderscience Publishers, 2013-01-01)
Classification of structured data is essential for a wide range of problems in bioinformatics and cheminformatics. One such problem is in silico prediction of small molecule properties such as toxicity, mutagenicity and activity. In this paper, we propose a new feature selection method for graph kernels that uses the subtrees of graphs as their feature sets. A masking procedure which boils down to feature selection is proposed for this purpose. Experiments conducted on several data sets as well as a compari...
Gradient characteristics of the unaccusative/unergative distinction in Turkish : an experimental investigation
Acartürk, Cengiz; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2005)
This thesis investigates the gradient behaviour of monadic intransitive verb classes in Turkish, under an aspectual classification of the unaccusative/unergative verb types, namely The Split Intransitivity Hierarchy. This Hierarchy claims that intransitive verb types are subject to gradient acceptability in certain syntactic constructions. The methods used in judgment elicitation studies in psychophysics, such as the magnitude estimation technique have recently been adapted to be used in capturing gradient ...
Citation Formats
B. K. Akkuş, “Supertagging with combinatory categorial grammar for dependency parsing,” M.S. - Master of Science, Middle East Technical University, 2014.