Alastair Butler – Research interests (expanded version)

Return to homepage

I am deeply fascinated by why natural languages should obey the grammaticality constraints they do, and my research has involved building mathematical and computational models to mimic observed natural language behaviour. This work has left me with a keen sensitivity to a wide range of grammatical constructions and ideas for why grammatical restrictions obtain. These ideas I am now exploring through collaborative efforts that are leading to the creation of corpus resources that contain human checked analyses of sentences for a number of languages (currently: English, Contemporary Japanese, and Old Japanese).

     My past research has focused on properties of the types of dependencies that can be formally captured with operator bindings, such as argument-predicate relations.

     My PhD work, which led to the publication of a book The Syntax and Semantics of Split Constructions (Palgrave Macmillan, 2004) co-authored with Eric Mathieu (Linguistics Department, University of Ottawa), introduced analyses based on limits of semantic evaluation for a wide range of constructions from in particular French that displayed options for discontinuity, with discontinuity always leading to extra restrictions on grammaticality.

     A subsequent book The Semantics of Grammatical Dependencies (Emerald, 2010) argues in favour of a formalised outlook on the role of grammaticality as providing a governing mechanism by which languages are kept to generally unambiguous forms that guarantee required operator-binding dependencies.

     Having realised a formal mechanism, the next step for my research was to build a computational implementation, which was subsequently described in Linguistic Expressions and Semantic Processing: A Practical Approach (Springer-Verlag, 2015).

     With the resulting theoretical apparatus and grounded computational implementation, my research has shifted to the empirical exercise of describing data with annotated corpora. As noted already, this is involving collaborative work on a number of languages.

     The biggest of these efforts is for Japanese, which has involved a large team of collaborators, and has been released for public download as the Keyaki Treebank. Continued work on this resource is now a major project at the National Institute for Japanese Language and Linguistics, building the NINJAL Parsed Corpus of Modern Japanese (NPCMJ).

     The goal for these corpus resources is to realise fully searchable representations of texts that are described according to interpretations of the meanings of sentences. This involves analysing texts into units, forming tree structures, and accounting for how the units are composed into meaningful expressions. Units are related by either grammatical functions, structural relations, or, as a last resort, by indices. A full description of a sentence tells the story of how information flows through structure to compose complex meanings. In order to achieve this, the corpus resources use familiar grammatical categories to name units in an economical way, according to some basic organising principles.

     One such principle for undertaking the corpus annotation, which relies on my prior theoretical and implementation work, defines how clauses in a complex sentence are allowed to “share” units, depending on the way that the clauses are linked together. For example, the subject position for a predicate in a subordinate clause can be empty if it “inherits” an argument from a superordinate clause. This relation approximates what theoretical syntacticians call “control”. Arguments and adjuncts can be “inherited” by more than one clause if the clauses are coordinated. This relation approximates what theoretical syntacticians call “Across the Board” extraction. These two relations allow many non-local relations to be described without the use of indices, simply by reference to the dominance relation between clauses and a stipulation of clause linkage type.

     My current research efforts are concerned with systematising the practices for assigning structural position and node descriptions to make the corpus resources more easily searchable. Under the overarching principle of searchability are two competing principles: Parsimony of categories requires that the inventory of node descriptions be as small as possible. Descriptive adequacy requires that any basic intuition about meanings in a sentence be represented by some kind of dependency that can be calculated from the structure.

     A notable area for future research is the development of a metric to evaluate analysis on the basis of the completeness and coherence of the semantic dependencies encoded therein. This is to involve encoding semantic dependencies in such a way as to offer measurable alignments between parsings. This promises to contribute both to quality control in corpus development, and to various practical applications such as the evaluation of parsing systems for Natural Language Processing.

     Another active area for my future research involves the projection onto nodes in a tree of indexing reflecting calculated dependencies. This technique can be used in the seeding of a lexical semantic resource that assigns semantic roles (frame elements) and a specific semantic frame to each attestation of a predicate. This process enables systematic sense disambiguation to be carried out as well. The actual annotation process takes the form of adding semantic role assignments to a pre-generated description in the node label of the predicate in the tree. Lexical entries in a dictionary can thereby be linked through their senses to specific attestations, and the expression of a role in a frame can be traced to realisations with different predicates.

     My career up to this point has followed a natural progression from concerns about the general properties of language to an exploration of how generalisations at a formal level find expression in specific languages. The approach is now finding palpable success through a largely data-driven process.



Last updated: Nov 11, 2017