Automated coherence detection with term-distance path extraction of the co-occurrence matrix of a document

Download
2015
Ağın, Halil
This thesis takes the distributional semantics (frequency-based semantics) approach as the theoretical framework to quantify textual coherence. Distributional semantics describes discourse sections as vectors, having dimensions are the frequency count of co-occurring words in the text within its semantic space. It quantifies the textual coherence by measuring the cosine values of vectors of successive sentences (cf. Latent Semantic Analysis, LSA). The common assumption underlying LSA based studies is that the frequency of word co-occurrence can be used as a cohesive cue to quantify textual coherence, thus leading to analyses based on a term-document matrix. In this thesis, the spatial distance of co-occurring words is considered as a new frequency event of cohesive cues and introduces a document-distance matrix, which is derived from the term-document matrix. This thesis proposes that the matrix representation of document-distance (a derivation of term-document matrix) of co-occurring words in adjacent sentences in a text can be used to quantify textual coherence. Two mathematical functions are suggested for deriving the document-distance matrix and two algorithms for the operations. The mathematical functions operate on the documentdocument matrix (a derivation of term-document matrix) to derive the documentdistance matrix. The algorithms measure the coherence of text by operating on the newly introduced document-distance matrices.
Citation Formats
H. Ağın, “Automated coherence detection with term-distance path extraction of the co-occurrence matrix of a document,” M.S. - Master of Science, Middle East Technical University, 2015.