Stanford Dependencies

About | Download | Other parsers | Mailing lists | GUI

About

The Stanford dependencies provide a representation of grammatical relations between words in a sentence. They have been designed to be easily understood and effectively used by people who want to extract textual relations. Stanford dependencies (SD) are triplets: name of the relation, governor and dependent. The standard dependencies for the sentence Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas are given below, as well as two graphical representations: the standard dependencies (collapsed and propagated) and the basic dependency representation in which each word in the sentence (except the head of the sentence) is the dependent of one other word (no collapsing, no propagation).

Dependencies for Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas Figure 1. Standard Stanford dependencies (collapsed and propagated) Figure 2. Basic dependencies

nsubjpass(submitted, Bills)
auxpass(submitted, were)
agent(submitted, Brownback)
nn(Brownback, Senator)
appos(Brownback, Republican)
prep_of(Republican, Kansas)
prep_on(Bills, ports)
conj_and(ports, immigration)
prep_on(Bills, immigration)

The English version of the Stanford dependencies has been developed by Marie-Catherine de Marneffe, Bill MacCartney, and Christopher Manning. All details about the English dependencies can be found in the manual:
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford Dependencies manual.
The manual contains a description of all the existing English grammatical relations in the representation. It explains the differences between the five types of representation available, and how such types of representation can be obtained. It also gives references to further discussion and use of the Stanford dependencies.

The dependencies are produced using hand-written tregex patterns over phrase-structure trees as described in:

Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006.
The main ideas motivating the Stanford dependency representation appear in this paper:

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In COLING 2008 Workshop on Cross-framework and Cross-domain Parser Evaluation.

The definition of the set of dependencies has evolved a little over the years, and the particular patterns used to convert phrase structure trees to dependencies have been improved quite a bit. Hence, if you are publishing a paper using Stanford Dependencies, we really appreciate it if you could indicate precisely which version you are using. This is easily done by citing the version of the Stanford Parser code used.

In practice, the dependencies can be obtained using our software in two ways. Either by using the Stanford parser with the -outputFormat typedDependencies option on raw text, or directly on phrase-structure trees using the EnglishGrammaticalStructure class available in the parser package. For English, five different variants of the dependencies are available, and different options can be used to get these. The default representation is the "CCprocessed" one, which collapses and propagates dependencies (as shown in Figure 1, in contrast to Figure 2 where the dependencies are not collapsed nor propagated). For more details, refer to section 5 of the Stanford Dependencies manual.

Here are some examples of Stanford Dependencies representations of sentences, originating from the Coling 2008 Workshop on Cross-Framework and Cross-Domain Parser Evaluation: required-wsj02.Stanford, optional-wsj02.Stanford, genia.stanford. Only the required WSJ set were hand-verified; the representations in the other two sets were automatically generated.

The Stanford dependencies are also available for Chinese. The Chinese dependencies have been developed by Huihsin Tseng and Pi-Chuan Chang. A brief description of the Chinese grammatical relations can be found in this paper.


Download

The dependency code is part of the Stanford parser. Go here to download a version.


Other parsers

While the original and canonical approach to generating the Stanford Dependencies is using the Stanford parser, there are now many other parsers which produce them, which may offer better speed or precision. Any phrase structure parser that constructs PTB style trees can be used, in addition to any trainable dependency parser. When using an alternative phrase structure parser, the Stanford Parser class EnglishGrammaticalStructure is used to extract dependencies from the resulting constituent parse trees. Trainable dependency parsers can produce the basic Stanford Dependency representation. This is a projective variant of the Stanford Dependencies that can be transformed into the default representation, CCprocessed, using EnglishGrammaticalStructure.

The table below summarizes some methods for generating the Stanford Dependencies along with the speed and accuracy of each approach on section 22 of the Penn TreeBank. Links are provided to the corresponding software packages and trained parsing models (some of the dependency models were trained by us). All the accuracies and timings here are for SDs corresponding to Stanford Parser version 1.6.2. The tree or basic dependency output is in each case converted to CCprocessed dependencies using our EnglishGrammaticalStructure class, and then evaluation is on CCprocessed dependencies. (I.e., you cannot directly compare these numbers with results on recovering Stanford basic dependencies, which is an easier task.)

ApproachLabeled Attachment (F1)Time (mm:ss)Links
Constituent
Charniak-Johnsondefault (T210)89.111:18[Software]
T5086.7  3:32
T1075.7  2:17
Berkeley Parser87.910:14[Software][Model]
Bikel85.329:57[Software][Model Data]
Stanford (englishPCFG)84.211:05[Software]
Dependency
Ensemble Malt82.4  1:56 [Software][Model] [Paper]
MaltParserNivre Eager, SVM poly deg:281.1  3:23[Software][Model we built/used] [English MaltParser model] [English MaltParser]
Nivre Eager, LibLinear76.2  0:16[Software][Model we built/used] [English MaltParser model] [English MaltParser]
MSTParser (Eisner) 78.8  6:01[Software][Model]
RelEx48.131:38[Software]
Easy-First Parser[Software]

The Charniak-Johnson parser includes a model for parsing English. The Bikel parser requires users to train their own model, which can be done using the included train-from-observed utility and the model data linked above. The RelEx package is rule-based and provides a Stanford Dependency compatibility mode.

For the dependency parsers, part-of-speech (POS) tags were generated using the Stanford POS tagger and the included left3words-wsj-0-18 model. Times represent the total time required to produce the dependencies including: POS tagging (if applicable), parsing, and extraction of the CCprocessed Stanford Dependency representation. Benchmarking was done on a dual CPU Intel Xeon E5520. Multithreading was disabled for the Charniak-Johnson parser, in order to obtain a per CPU-core estimate of parsing speed.

In general, all parsers were run in their default out-of-the-box configurations. But, in addition, for the Charniak-Johnson parser, the table above also shows the speed and accuracy trade-offs from varying the amount of search by setting different T values (by default T = 210). The Charniak-Johnson parser allows users to trade off parsing accuracy for speed by adjusting how liberal the system is about expanding edges after the best-first-search has found one complete parse of the sentence: they constrain themselves to only examine Tval/10 times more edges in search of a better parse.

For more information about these parsing accuracy vs. speed trade-offs when generating Stanford Dependencies, see:

Daniel Cer, Marie-Catherine de Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to Stanford Dependencies: Trade-offs between speed and accuracy. In 7th International Conference on Language Resources and Evaluation (LREC 2010). [pdf, bib]

Mailing lists

To ask questions about the dependencies, you can use the same lists as for the parser, each @lists.stanford.edu:

  1. parser-user This is the best list to post to in order to ask questions, make announcements, or for discussion among parser users. Join the list via this webpage or by emailing parser-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
  2. parser-announce This list will be used only to announce new parser versions. So it will be very low volume (expect 1-3 message a year). Join the list via this webpage or by emailing parser-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
  3. parser-support This list goes only to the parser maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using parser-user. You cannot join parser-support, but you can mail questions to parser-support@lists.stanford.edu.

GUI

Bernard Bou developed a GUI focusing on the typed dependencies, including an editor:

GrammarScope: Stanford parser grammatical relation browser

We now have a nice visualization of Stanford Dependencies in our online Stanford CoreNLP demo, provided by brat.