Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.
Tregex: The best introduction to Tregex is the brief powerpoint tutorial for
by Galen Andrew. The best way to learn to use Tregex is by working with
the GUI (TregexGUI). It has help screens which summarize the
syntax of Tregex. You can find brief documentation of Tregex's pattern language
TregexPattern javadoc page, and, of course, you should also be
very familiar with Java
regular expression syntax. Tregex
contains essentially the same functionality as
TGrep2 (which had a
superset of the functionality of the original tgrep), plus several
extremely useful relations for natural language trees, for example "A
is the lexical head of B", and "A and B share a (hand-specified)
variable substring" (useful for finding nodes coindexed with each
other). Because it does not create preprocessed indexed corpus files,
it is however somewhat slower than TGrep2 when searching over large
treebanks, but gains from being able to be run on any trees without
requiring index construction.
As a Java application, it is platform independent, and can
be used programmatically in Java software. There is also both a
graphical interface (also platform independent) and a command
line interface through the
TregexPattern main method. To launch
the graphical interface double click the stanford-tregex.jar file.
Tsurgeon: A good introduction is the powerpoint slides for Tsurgeon by Marie-Catherine de Marneffe. Tsurgeon can be run from the command line and is also incorporated into the TregexGUI graphical interface. Its syntax is presented on the Tsurgeon javadoc page.
Tregex was written by Galen Andrew and Roger Levy. Tsurgeon was written by Roger Levy. The graphical interface for both was written by Anna Rafferty. A lot of bug fixing and various extensions to both were done by John Bauer. Semgrex was written by Chloe Kiddon and John Bauer. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.
There is a paper describing Tregex and Tsurgeon. You're encouraged to cite it if you use Tregex or Tsurgeon.
Roger Levy and Galen Andrew. 2006. Tregex and Tsurgeon: tools for querying and manipulating tree data structures. 5th International Conference on Language Resources and Evaluation (LREC 2006).
Semgrex is very briefly described in this paper:
Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloe Kiddon Bill MacCartney, Marie-Catherine de Marneffe, Daniel Ramage Eric Yeh, and Christopher D. Manning. 2007. Learning Alignments and Leveraging Natural Logic. Proceedings of the Workshop on Textual Entailment and Paraphrasing, pages 165–170,
Tregex, Tsurgeon, and Semgrex are licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. Source is included. The package includes components for command-line invocation and a Java API.
We have 3 mailing lists for the Tregex/Tsurgeon, all of which are shared with the Stanford Parser. Each is at
parser-userThis is the best list to post to in order to ask questions, make announcements, or for discussion among Tregex/Tsurgeon users. Join the list via this webpage or by emailing
email@example.com. (Leave the subject and message body empty.) You can also look at the list archives.
parser-announceThis list will be used only to announce new Tregex/Tsurgeon versions. So it will be very low volume (expect 1-3 message a year). Join the list via this webpage or by emailing
firstname.lastname@example.org. (Leave the subject and message body empty.)
parser-supportThis list goes only to the Tregex/Tsurgeon maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using
parser-user. You cannot join
parser-support, but you can mail questions to
The download is a 9 Mb zip file. It contains:
lib/ABOUT-AppleJavaExtensions.txtfor removing this dependency)
|Version 3.7.0||Update for compatibility|
|Version 3.6.0||Updated for compatibility|
|Version 3.5.2||Update for compatibility|
|Version 3.5.1||Update for compatibility|
|Version 3.5.0||Upgrade to Java 8|
|Version 3.4.1||Fix a thread safety issue in tsurgeon. Last version to support Java 6 and Java 7.|
|Version 3.4||Added a new tregex pattern, exact subtree, and improved efficiency for certain operations|
|Version 3.3.1||Added a new tsurgeon operation, createSubtree|
|Version 3.3.0||Add an option to get a TregexMatcher from a TregexPattern with a different HeadFinder|
|Version 3.2.0||Fix minor bug in tsurgeon indexing|
|Version 2.0.6||Updated for compatibility with other software releases|
|Version 2.0.5||Minor efficiency improvements|
|Version 2.0.4||Minor bug fixes|
|Version 2.0.3||Updated to maintain compatibility with other Stanford software.|
|Version 2.0.2||Regex matching efficiency improvement|
|Version 2.0.1||Fix matchesAt, fix category heads. Last version to support Java 5.|
|Version 2.0||Introduces semgrex, which operates on SemanticGraphs.|
|Version 1.4.4||Updated to maintain compatibility with other Stanford software.|
|Version 1.4.3||Updated to maintain compatibility with other Stanford software.|
|Version 1.4.2||Addition of tree difference display. Several bugfixes.|
|Version 1.4.1||Small fixes and improvements (multipattern Tsurgeon scripts, file and line numbers in sentence window, fixed GUI lock-up and tregex immediate domination path matching)|
|Version 1.4||GUI slider for tree size, allow @ and __ in path constraints, incompatibly generalize Tsurgeon relabel command, bug fix for links and backreferences being used as named node, more memory/space efficient treebank reading|
|Version 1.3.2||Additional features added to the graphical interface, which is now version 1.1: browse trees, better memory handling|
|Version 1.3.1||Additional features added to the graphical interface: better copy/paste and drag and drop support, capability to save matched sentences as well as matched trees, and can save files in different encodings|
|Version 1.3||Various bug fixes and improvements; additional Tsurgeon operations; and added a graphical interface|
|Version 1.2||Bundled in Tsurgeon.|
|Version 1.1.1||Fixed bugs: 1) in variable groups; 2) in number of reported matches for "<" relation|
|Version 1.1||Several new relations added; variable substring capability added too.|
|Version 1.0||Initial release|