Here's a brief list of some projects I am currently involved with.

Past projects (note that a number of these links are broken due to a missing server).

  • EARL: Efficient Annotation of Resources by Learning. (NSF, 2007-2009)
  • TEXTIME: Temporal Expressions and Time Processing (in Texas). (New York Community Trust, 2006-2008)
  • DISCOR: Discourse Structure and Coreference Resolution. (NSF, 2006-2008)
  • OpenCCG Front End: DotCCG specification language and VisCCG GUI editor for OpenCCG grammars. (LAITS, 2006-2007)
  • UT Austin NLP Suite: suite of open source NLP packages with tutorials. (~FastTex, 2006-2007)

Computational linguistics usually requires writing a fair amount of programming code, but there is a lot of existing software that can be used directly or built on for performing natural language processing tasks. Open source software is particularly appealing because it allows you to modify the source code if you need to. Here are some of the open source software that I am involved with.

  • Scalabha: a Scala API for teaching and research for natural language processing.
  • TextGrounder: a Java system that processes texts to identify the places and times that are mentioned in them and disambiguates them to points on Earth or on the timeline.
  • The OpenNLP toolkit: a suite of Java tools for various NLP tasks, including sentence splitting, part-of-speech tagging, and parsing.
  • OpenCCG: a Java parsing/realization system for Combinatory Categorial Grammar.
  • The Junto Label Propagation Toolkit
  • OpenNLP Maxent: a Java implementation of Generalized Iterative Scaling for training and using maximum entropy models.
  • MSTParser: a Java dependency parser.
  • TADM (Toolkit for Advanced Discriminative Modeling): a C++ package for training maximum entropy and perceptron models.

See the UT Compling Github page and UT Compling Bitbucket page for more code from our lab.