Rich statistical parsing and literary language

PhD defense of Andreas van Cranenburgh 13.00, 2 November 2016, Aula of University of Amsterdam

PhD from Riddle of Literary Quality project, funded by KNAW Computational Humanities Programme

Summary

This thesis studies parsing and literature with the Data-Oriented Parsing framework, which assumes that chunks of previous experience can be exploited to analyze new sentences. As chunks we consider syntactic tree fragments. After presenting a method to efficiently extract such fragments from treebanks based on heuristics of re-occurrence, we employ them to develop a multi-lingual statistical parser. We show how a mildly context-sensitive grammar can be employed to produce discontinuous constituents, and compare this to an approximation that stays within the efficiently parsable context-free framework. We show that tree fragments allow the grammar to adequately capture the statistical regularities of non-local relations, without the need for the increased generative capacity of mildly context-sensitive grammar.

Supervisors

Professor Rens Bod & Dr I Titov

more details here