The following programs were written while I was at Columbia University. I continue to maintain them, though they are still distributed by Columbia. Under restrictions imposed by Columbia, these programs can only be used for research and educational purposes. To get access to the two first programs in the list, any potential user needs to print and fax one license agreement for each program (see below). Once approved, download instructions will follow.

  • LCseg [ license agreement ]
    A domain-independent discourse segmenter based on lexical cohesion. It divides unrestricted texts into topically cohesive units. This work is described in my ACL-03 paper.
  • LexChainer [ license agreement ]
    A tool that uses WordNet on unrestricted texts for finding lexical chains, chains of semantically related words. This tool also does word sense disambiguation to ensure that words appearing in the same chain have related meanings. This work is described in my IJCAI-03 paper.
  • NXT transcription extraction tool
    dump_meeting is a small program that creates plain-text meeting transcriptions and annotation from NXT-encoded meeting data. It currently supports extraction of transcriptions, extractive summaries, dialog acts, adjacency pairs, and topic segmentation. Various printing options are available (e.g., punctuation, case sensitive, ASR-like), and turn segmentation may be arbitrarily defined (dialog act units, silence-based, or as specified by the user). Detailed usage is available here.

