|
|
About | Questions | Mailing lists | Contents | Download | Release history | FAQ
Tregex is a Tgrep2-style utility for matching patterns in trees. It
contains essentially the same functionality as Tgrep-2, plus several
extremely useful relations for natural language trees, for example "A
is the lexical head of B", and "A and B share a (hand-specified)
variable substring" (useful for finding nodes coindexed with each
other). Because it does not create preprocessed indexed corpus files,
it is however somewhat slower than Tgrep-2 when searching over large
treebanks. As a Java application, it is platform independent, and can
be used programmatically in Java software. There is also both a
graphical interface (also platform independent) and a command
line interface through the TregexPattern main method. To launch
the graphical interface double click the stanford-tregex.jar file.
As of version 1.2, Tregex bundles the Tsurgeon tree transformation utility. Tsurgeon is also incorporated into the graphical interface and can be run from the command line.
Tregex was written by Galen Andrew and Roger Levy, Tsurgeon was written by Roger Levy, and the graphical interface for both was written by Anna Rafferty. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.
There is a paper describing Tregex and Tsurgeon:
Roger Levy and Galen Andrew. 2006. Tregex and Tsurgeon: tools for querying and manipulating tree data structures. 5th International Conference on Language Resources and Evaluation (LREC 2006).
There is a tregex FAQ list (with answers!).
Please send any other questions or feedback, or extensions and bugfixes to
parser-user@lists.stanford.edu.
Tregex is licensed under the GNU GPL. (Note that this is the full GPL - which allows its use for research purposes or other free software projects but does not allow its incorporation into any type of commercial software, even in part or in translation.) Source is included. The package includes components for command-line invocation and a Java API.
We have 3 mailing lists for the Tregex/Tsurgeon, all of which are shared
with the Stanford Parser. Each is at @lists.stanford.edu:
parser-user This is the best list to post to in order
to ask questions, make announcements, or for discussion among Tregex/Tsurgeon
users. Join the list by emailing
parser-user-join@lists.stanford.edu. (Leave the
subject and message body empty.) You can also
look at
the list archives.
parser-announce This list will be used only to announce
new parser and Tregex/Tsurgeon versions. So it will be very low volume (expect 1-3
message a year). Join the list by emailing
parser-announce-join@lists.stanford.edu. (Leave the
subject and message body empty.)
parser-support This list goes only to the Tregex/Tsurgeon
maintainers. It's a good address for licensing questions, etc. For
general use and support questions, you're better off joining and using
parser-user.
You cannot join parser-support, but you can mail questions to
parser-support@lists.stanford.edu.
The download is a 4.7 Mb gzipped tar file. It contains:
| Version 1.0 | Initial release | |
| Version 1.1 | Several new relations added; variable substring capability added too. | |
| Version 1.1.1 | Fixed bugs: 1) in variable groups; 2) in number of reported matches for "<" relation | |
| Version 1.2 | Bundled in Tsurgeon. | |
| Version 1.3 | Various bug fixes and improvements; additional Tsurgeon operations; and added a graphical interface | |
| Version 1.3.1 | Additional features added to the graphical interface: better copy/paste and drag and drop support, capability to save matched sentences as well as matched trees, and can save files in different encodings | |
| Version 1.3.2 | Additional features added to the graphical interface, which is now version 1.1: browse trees, better memory handling |
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · ScalaNLP · machines · Wiki |
Site design by Bill MacCartney |