Universal Dependencies | Download | About | Ongoing projects | SD for English | SD for Chinese | Other languages | Other parsers | Mailing lists | GUI
Since version 3.5.2 the Stanford Parser and Stanford CoreNLP output grammatical relations in the new Universal Dependencies representation. Take a look at the Universal Dependencies documentation for a detailed description of the new representation and its set of relations.
We also still support the original Stanford Dependencies representation as described on this page and in the original papers. To output relations in the original Stanford Dependencies representation use the
-originalDependencies option when running the parser or the
-parse.originalDependencies option when running a CoreNLP pipeline with the PCFG parser. If you are using the Neural Network dependency parser and want to get the original Stanford Dependencies, you have to use the model trained on a corpus annotated with the Stanford Dependencies representation using the following option:
The dependency code is part of the Stanford parser. Go here to download a version.
Our gold standard for the EWT (see below) is not officially released yet, but a working version can be downloaded from here.
The Stanford dependencies provide a representation of grammatical relations between words in a sentence. They have been designed to be easily understood and effectively used by people who want to extract textual relations. Stanford dependencies (SD) are triplets: name of the relation, governor and dependent. The standard dependencies for the sentence Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas are given below, as well as two graphical representations: the standard dependencies (collapsed and propagated) and the basic dependency representation in which each word in the sentence (except the head of the sentence) is the dependent of one other word (no collapsing, no propagation).
|Dependencies for Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas||Figure 1. Standard Stanford dependencies (collapsed and propagated)||Figure 2. Basic dependencies|
We created a gold standard dependency corpus on top of the English Web Treebank (LDC2012T13). We manually annotated 254,830 words with SD for English. The effort is meant to address the scarcity of both gold standard dependency corpora for English and annotated resources for parsing web test. This resource is described here:
Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer and Christopher D. Manning. 2014. A Gold Standard Dependency Corpus for English. In LREC 2014.
This annotation effort has led to refinements of Stanford Dependencies. We describe changes to the standard and propose analyses for a few syntactic constructions of interest, to be found in the following paper:
Marie-Catherine de Marneffe, Miriam Connor, Natalia Silveira, Samuel R. Bowman, Timothy Dozat and Christopher D. Manning. 2013. More constructions, more genres: Extending Stanford Dependencies. In DepLing 2013.
We are also working on an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. A first version of this new standard, called Universal Dependencies, is described here:
Marie-Catherine de Marneffe, Natalia Silveira, Timothy Dozat, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In LREC 2014.
The most up to date version of Universal Depencies is documented online at http://universaldependencies.github.com/docs/.
The English version of the Stanford dependencies has been developed by Marie-Catherine de Marneffe, Bill MacCartney, and Christopher Manning. All details about the English dependencies can be found in the manual:
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford Dependencies manual.The manual contains a description of all the existing English grammatical relations in the representation. It explains the differences between the five types of representation available, and how such types of representation can be obtained. It also gives references to further discussion and use of the Stanford dependencies. The dependencies are produced using hand-written
tregexpatterns over phrase-structure trees as described in:
Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006.The main ideas motivating the Stanford dependency representation appear in this paper:
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In COLING 2008 Workshop on Cross-framework and Cross-domain Parser Evaluation.
The definition of the set of dependencies has evolved a little over the years, and the particular patterns used to convert phrase structure trees to dependencies have been improved quite a bit. Hence, if you are publishing a paper using Stanford Dependencies, we really appreciate it if you could indicate precisely which version you are using. This is easily done by citing the version of the Stanford Parser code used.
In practice, the dependencies can be obtained using our software in two ways. Either by using the Stanford parser with the
-outputFormat typedDependencies option on raw text, or directly on phrase-structure trees using the
EnglishGrammaticalStructure class available in the parser package. For English, five different variants of the dependencies are available, and different options can be used to get these. The default representation is the "CCprocessed" one, which collapses and propagates dependencies (as shown in Figure 1, in contrast to Figure 2 where the dependencies are not collapsed nor propagated). For more details, refer to section 5 of the Stanford Dependencies manual.
Here are some examples of Stanford Dependencies representations of sentences, originating from the Coling 2008 Workshop on Cross-Framework and Cross-Domain Parser Evaluation: required-wsj02.Stanford, optional-wsj02.Stanford, genia.stanford. Only the required WSJ set were hand-verified; the representations in the other two sets were automatically generated.
Stanford dependencies are also available for Chinese. The Chinese dependencies have been developed by Huihsin Tseng and Pi-Chuan Chang. A brief description of the Chinese grammatical relations can be found in this paper.
Versions of Stanford Dependencies have also been developed by outside groups for a number of other languages. Two prominent examples are Finnish (the Turku Dependency Treebank) and Persian (the Uppsala Persian Dependency Treebank). There is now a multi-site effort to produce dependency treebanks over a broad range of languages adopting a compatible dependency taxonomy. More details about this Universal Dependency Treebank can be found in the LREC 2014 paper mentioned above, in the current treebank release, and in new documentation.
While the original and canonical approach to generating the Stanford
Dependencies is using the Stanford parser, there are now many other
parsers which produce them, which may offer better speed or precision.
Any phrase structure parser that constructs PTB style trees can be used,
in addition to any trainable dependency parser. When using an
alternative phrase structure parser, the Stanford Parser
EnglishGrammaticalStructure is used to extract
dependencies from the resulting constituent parse trees. Trainable
dependency parsers can produce the basic Stanford Dependency
representation. This is a projective variant of the Stanford
Dependencies that can be transformed into the default representation,
The table below summarizes some methods for generating the Stanford
Dependencies along with the speed and accuracy of each approach on
section 22 of the Penn TreeBank. Links are provided to the corresponding
software packages and trained parsing models (some of the dependency
models were trained by us). All the accuracies and timings here are for
SDs corresponding to Stanford Parser version 1.6.2. The tree or basic
dependency output is in each case converted to CCprocessed dependencies
EnglishGrammaticalStructure class, and then
evaluation is on CCprocessed dependencies. (I.e., you cannot
these numbers with results on recovering Stanford basic dependencies,
which is an easier task.)
|Approach||Labeled Attachment (F1)||Time (mm:ss)||Links|
|Ensemble Malt||82.4||1:56||[Software][Model] [Paper]|
|MaltParser||Nivre Eager, SVM poly deg:2||81.1||3:23||[Software][Model we built/used] [English MaltParser model] [English MaltParser]|
|Nivre Eager, LibLinear||76.2||0:16||[Software][Model we built/used] [English MaltParser model] [English MaltParser]|
The Charniak-Johnson parser includes a model for parsing English. The Bikel parser requires users to train their own model, which can be done using the included
train-from-observed utility and the model data linked above. The RelEx package is rule-based and provides a Stanford Dependency compatibility mode.
For the dependency parsers, part-of-speech (POS) tags were generated using the Stanford POS tagger and the included left3words-wsj-0-18 model. Times represent the total time required to produce the dependencies including: POS tagging (if applicable), parsing, and extraction of the CCprocessed Stanford Dependency representation. Benchmarking was done on a dual CPU Intel Xeon E5520. Multithreading was disabled for the Charniak-Johnson parser, in order to obtain a per CPU-core estimate of parsing speed.
In general, all parsers were run in their default out-of-the-box configurations. But, in addition, for the Charniak-Johnson parser, the table above also shows the speed and accuracy trade-offs from varying the amount of search by setting different T values (by default T = 210). The Charniak-Johnson parser allows users to trade off parsing accuracy for speed by adjusting how liberal the system is about expanding edges after the best-first-search has found one complete parse of the sentence: they constrain themselves to only examine Tval/10 times more edges in search of a better parse.
For more information about these parsing accuracy vs. speed trade-offs when generating Stanford Dependencies, see:
Daniel Cer, Marie-Catherine de Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to Stanford Dependencies: Trade-offs between speed and accuracy. In 7th International Conference on Language Resources and Evaluation (LREC 2010). [pdf, bib]
To ask questions about the dependencies, you can use the same lists as for the parser, each
parser-userThis is the best list to post to in order to ask questions, make announcements, or for discussion among parser users. Join the list via this webpage or by emailing
firstname.lastname@example.org. (Leave the subject and message body empty.) You can also look at the list archives.
parser-announceThis list will be used only to announce new parser versions. So it will be very low volume (expect 1-3 message a year). Join the list via this webpage or by emailing
email@example.com. (Leave the subject and message body empty.)
parser-supportThis list goes only to the parser maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using
parser-user. You cannot join
parser-support, but you can mail questions to
Bernard Bou developed a GUI focusing on the typed dependencies, including an editor:
GrammarScope: Stanford parser grammatical relation browser
We now have a nice visualization of Stanford Dependencies in our
online Stanford CoreNLP demo,
provided by brat.
Site design by Bill MacCartney