A message received yesterday from David Bamman at Perseus:
The Perseus Project has recently received a planning grant from the NSF to investigate the costs and labor involved in constructing a multimillion-word Latin treebank, along with its potential value for the linguistics and Classics community. While our initial efforts under this grant will focus on syntactically annotating excerpts from Golden Age authors (Caesar, Cicero, Vergil) and the Vulgate, a future multimillion-word corpus would be comprised of writings from the pre- Classical period up through the Early Modern era. To date we’ve annotated a total of 12,000 words in a style that’s predominantly informed by two sources: the dependency grammar used by the Prague Dependency Treebank (itself based on Mel’cuk 1988), and the Latin grammar of Pinkster 1990.
While treebanks provide valuable training data for computational tasks such as grammar induction and automatic syntactic parsing, they also have the potential to be used in traditional research areas that Classicists in particular are poised to exploit. Large collections of syntactically parsed sentences have the potential to revolutionize lexicography and philology, as they provide the immediate context for a word’s use along with its typical syntactic arguments (this lets us chart, for example, how the meaning of a verb changes as its predominant arguments change). Treebanks enable large-scale research into structurally-based rhetorical devices particularly of interest to Classicists (such as hyperbaton) and they provide the raw data for research in historical linguistics (such as the move in Latin from classical SOV word order to romance SVO).
The eventual Latin treebank will be openly available to the public; we should, therefore, come to a consensus on how it should be built. To that end we encourage input from the linguistics and Classics community on the treebank design (including the syntactic representation of Latin) and welcome contributions by annotators (for which limited funding is available). Interested collaborators should contact David Bamman (David.Bamman@tufts.edu) at the Perseus Project.