Tregex and Tsurgeon

About | Questions | Mailing lists | Contents | Download | Release history | FAQ

About

Tregex is a Tgrep2-style utility for matching patterns in trees. It contains essentially the same functionality as Tgrep-2, plus several extremely useful relations for natural language trees, for example "A is the lexical head of B", and "A and B share a (hand-specified) variable substring" (useful for finding nodes coindexed with each other). Because it does not create preprocessed indexed corpus files, it is however somewhat slower than Tgrep-2 when searching over large treebanks. As a Java application, it is platform independent, and can be used programmatically in Java software. There is also both a graphical interface (also platform independent) and a command line interface through the TregexPattern main method. To launch the graphical interface double click the stanford-tregex.jar file.

As of version 1.2, Tregex bundles the Tsurgeon tree transformation utility. Tsurgeon is also incorporated into the graphical interface and can be run from the command line.

Tregex was written by Galen Andrew and Roger Levy, Tsurgeon was written by Roger Levy, and the graphical interface for both was written by Anna Rafferty. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.

There is a paper describing Tregex and Tsurgeon:

Roger Levy and Galen Andrew. 2006. Tregex and Tsurgeon: tools for querying and manipulating tree data structures. 5th International Conference on Language Resources and Evaluation (LREC 2006).

Questions

There is a tregex FAQ list (with answers!). Please send any other questions or feedback, or extensions and bugfixes to parser-user@lists.stanford.edu.

Tregex is licensed under the GNU GPL. (Note that this is the full GPL - which allows its use for research purposes or other free software projects but does not allow its incorporation into any type of commercial software, even in part or in translation.) Source is included. The package includes components for command-line invocation and a Java API.


Mailing Lists

We have 3 mailing lists for the Tregex/Tsurgeon, all of which are shared with the Stanford Parser. Each is at @lists.stanford.edu:

  1. parser-user This is the best list to post to in order to ask questions, make announcements, or for discussion among Tregex/Tsurgeon users. Join the list by emailing parser-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
  2. parser-announce This list will be used only to announce new parser and Tregex/Tsurgeon versions. So it will be very low volume (expect 1-3 message a year). Join the list by emailing parser-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
  3. parser-support This list goes only to the Tregex/Tsurgeon maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using parser-user. You cannot join parser-support, but you can mail questions to parser-support@lists.stanford.edu.

Contents

The download is a 4.7 Mb gzipped tar file. It contains:

  1. README-tregex.txt -- Basic information about the distribution, including a "quickstart" guide.
  2. README-tsurgeon.txt -- information about Tsurgeon.
  3. README-gui.txt -- information about using the graphical interface
  4. LICENSE -- Tregex is licensed under the Gnu General Public License.
  5. stanford-tregex.jar -- This is a JAR file containing all the Stanford classes necessary to run tregex.
  6. src directory -- a directory with the source files for Tregex and Tsurgeon
  7. lib directory -- library files required for recompiling the distribution
  8. build.xml, Makefile -- files for recompiling (with ant or make) the distribution
  9. javadoc -- Javadocs for the distribution.
  10. tregex.sh, tsurgeon.sh -- sample scripts for running Tregex and Tsurgeon from the command line
  11. run-tregex-gui.command, run-tregex-gui.bat -- shell script for running the graphical interface for Tregex with more memory for searching larger treebanks; can be double-clicked to open on a Mac or PC, respectiveley
  12. examples directory -- example files for Tregex and Tsurgeon

Download

Download Tregex version 1.3.2

Release history

Version 1.02005-02-17 Initial release
Version 1.12005-07-19 Several new relations added; variable substring capability added too.
Version 1.1.12005-09-15 Fixed bugs: 1) in variable groups; 2) in number of reported matches for "<" relation
Version 1.22005-11-23 Bundled in Tsurgeon.
Version 1.32007-09-20Various bug fixes and improvements; additional Tsurgeon operations; and added a graphical interface
Version 1.3.12007-11-20Additional features added to the graphical interface: better copy/paste and drag and drop support, capability to save matched sentences as well as matched trees, and can save files in different encodings
Version 1.3.22008-05-06Additional features added to the graphical interface, which is now version 1.1: browse trees, better memory handling