Starting in 2005, we developed a linguistically sound, surface-syntax oriented dependency representation for English, which came to be known as Stanford Dependencies. This representation was met with interest by many people and later in 2013 we began collaborating with a broader consortium to propose Universal Dependencies, a similar dependency representation suitable for all languages.
Since version 3.5.2 the Stanford Parser and Stanford CoreNLP output grammatical relations in the Universal Dependencies v1 representation by default. Take a look at the Universal Dependencies v1 documentation for a detailed description of the v1 representation, its set of relations, and links to dependency treebank downloads. You might also want to look at the the current (v2) Universal Dependencies documentation, but we have yet to update our tools to this representation. More information on the Universal Dependencies converter and the enhanced representation can be found in this paper.
If you have an English constituency treebank in Penn Treebank (s-expression) format in the file or directory
treebank, you can use our code to convert it to a file of basic Universal Dependencies in CoNLL-U format with this command:
java -mx1g edu.stanford.nlp.trees.ud.UniversalDependenciesConverter -treeFile treebank > treebank.conllu
We also still support the original Stanford Dependencies representation as described on this page and in the original papers. To output relations in the original Stanford Dependencies representation use the
-originalDependencies option when running the parser or the
-parse.originalDependencies option when running a CoreNLP pipeline with the PCFG parser. If you are using the Neural Network dependency parser and want to get the original Stanford Dependencies, you have to use the model trained on a corpus annotated with the Stanford Dependencies representation using the following option:
The dependency code is part of the Stanford parser. Go here to download a version.
Stanford dependencies provides a representation of grammatical relations between words in a sentence. They have been designed to be easily understood and effectively used by people who want to extract textual relations. Stanford dependencies (SD) are triplets: name of the relation, governor and dependent. The standard dependencies for the sentence Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas are given below, as well as two graphical representations: the standard dependencies (collapsed and propagated) and the basic dependency representation in which each word in the sentence (except the head of the sentence) is the dependent of one other word (no collapsing, no propagation).
|Dependencies for Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas||Figure 1. Standard Stanford dependencies (collapsed and propagated)||Figure 2. Basic dependencies|
We created a gold standard dependency corpus on top of the English Web Treebank (LDC2012T13). We manually annotated 254,830 words with SD for English. The effort is meant to address the scarcity of both gold standard dependency corpora for English and annotated resources for parsing web test. This resource is described here:
Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer and Christopher D. Manning. 2014. A Gold Standard Dependency Corpus for English. In LREC 2014.
This annotation effort has led to refinements of Stanford Dependencies. We describe changes to the standard and propose analyses for a few syntactic constructions of interest, to be found in the following paper:
Marie-Catherine de Marneffe, Miriam Connor, Natalia Silveira, Samuel R. Bowman, Timothy Dozat and Christopher D. Manning. 2013. More constructions, more genres: Extending Stanford Dependencies. In DepLing 2013.
We are now primarily working on an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. A first version of this new standard, called Universal Dependencies, is described here:
Marie-Catherine de Marneffe, Natalia Silveira, Timothy Dozat, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In LREC 2014.
A more recent version of Universal Dependencies is described in the following two papers:
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In LREC 2016.
Sebastian Schuster and Christopher D. Manning. 2016. Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks. In LREC 2016.
The most up to date version of Universal Depencies is documented online at http://www.universaldependencies.org.
The English version of the Stanford dependencies has been developed by Marie-Catherine de Marneffe, Bill MacCartney, and Christopher Manning. All details about the English dependencies can be found in the manual:
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford Dependencies manual.The manual contains a description of all the existing English grammatical relations in the representation. It explains the differences between the five types of representation available, and how such types of representation can be obtained. It also gives references to further discussion and use of the Stanford dependencies. The dependencies are produced using hand-written
tregexpatterns over phrase-structure trees as described in:
Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006.The main ideas motivating the Stanford dependency representation appear in this paper:
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In COLING 2008 Workshop on Cross-framework and Cross-domain Parser Evaluation.
The definition of the set of dependencies has evolved a little over the years, and the particular patterns used to convert phrase structure trees to dependencies have been improved quite a bit. Hence, if you are publishing a paper using Stanford Dependencies, we really appreciate it if you could indicate precisely which version you are using. This is easily done by citing the version of the Stanford Parser code used.
In practice, the dependencies can be obtained using our software in two ways. Either by using the Stanford parser with the
-outputFormat typedDependencies option on raw text, or directly on phrase-structure trees using the
EnglishGrammaticalStructure class available in the parser package.
If you have an English constituency treebank in Penn Treebank (s-expression) format in the file or directory
treebank, you can use our code to convert it to a file of basic Stanford Dependencies in CoNLL-X format with this command:
java -mx1g edu.stanford.nlp.trees.EnglishGrammaticalStructure -basic -keepPunct -conllx -treeFile treebank > treebank.sd
For English, five different variants of the dependencies are available, and different options can be used to get these. The default representation is the "CCprocessed" one, which collapses and propagates dependencies (as shown in Figure 1, in contrast to Figure 2, which shows the "basic" dependencies, which are not collapsed nor propagated). For more details, refer to section 5 of the Stanford Dependencies manual.
Here are some examples of Stanford Dependencies representations of sentences, originating from the Coling 2008 Workshop on Cross-Framework and Cross-Domain Parser Evaluation: required-wsj02.Stanford, optional-wsj02.Stanford, genia.stanford. Only the required WSJ set were hand-verified; the representations in the other two sets were automatically generated.
A corpus of English biomedical texts, with hand-corrected annotations in a slight variant of the Stanford typed dependency format is available from The BioInfer project.
Stanford dependencies are also available for Chinese. The Chinese dependencies have been developed by Huihsin Tseng and Pi-Chuan Chang. A brief description of the Chinese grammatical relations can be found in this paper.If you have a version of the LDC Chinese Treebank (or some other Chinese constituency treebank in Penn Treebank s-expression format) in the file or directory
treebank, you can use our code to convert it to a file of basic Chinse Stanford Dependencies in CoNLL-X format with this command:
java -mx1g edu.stanford.nlp.trees.international.pennchinese.ChineseGrammaticalStructure -basic -keepPunct -conllx -treeFile treebank > treebank.csd
Versions of Stanford Dependencies have also been developed by outside groups for a number of other languages. Two prominent examples are Finnish (the Turku Dependency Treebank) and Persian (the Uppsala Persian Dependency Treebank). There is now a multi-site effort to produce dependency treebanks over a broad range of languages adopting a compatible dependency taxonomy. More details about this Universal Dependency Treebank can be found in the LREC 2014 and LREC 2016 papers mentioned above, in the current treebank release, and in new documentation.
While the original and canonical approach to generating the Stanford
Dependencies is using the Stanford parser, there are now many other
parsers which produce them, which may offer better speed or precision.
Any phrase structure parser that constructs PTB style trees can be used,
in addition to any trainable dependency parser. When using an
alternative phrase structure parser, the Stanford Parser
EnglishGrammaticalStructure is used to extract
dependencies from the resulting constituent parse trees. Trainable
dependency parsers can produce the basic Stanford Dependency
representation. This is a projective variant of the Stanford
Dependencies that can be transformed into the default representation,
The table below summarizes some methods for generating the Stanford
Dependencies along with the speed and accuracy of each approach on
section 22 of the Penn TreeBank. Links are provided to the corresponding
software packages and trained parsing models (some of the dependency
models were trained by us). All the accuracies and timings here are for
SDs corresponding to Stanford Parser version 1.6.2. The tree or basic
dependency output is in each case converted to CCprocessed dependencies
EnglishGrammaticalStructure class, and then
evaluation is on CCprocessed dependencies. (I.e., you cannot
these numbers with results on recovering Stanford basic dependencies,
which is an easier task.)
|Approach||Labeled Attachment (F1)||Time (mm:ss)||Links|
|Ensemble Malt||82.4||1:56||[Software][Model] [Paper]|
|MaltParser||Nivre Eager, SVM poly deg:2||81.1||3:23||[Software][Model we built/used] [English MaltParser model] [English MaltParser]|
|Nivre Eager, LibLinear||76.2||0:16||[Software][Model we built/used] [English MaltParser model] [English MaltParser]|
The Charniak-Johnson parser includes a model for parsing English. The Bikel parser requires users to train their own model, which can be done using the included
train-from-observed utility and the model data linked above. The RelEx package is rule-based and provides a Stanford Dependency compatibility mode.
For the dependency parsers, part-of-speech (POS) tags were generated using the Stanford POS tagger and the included left3words-wsj-0-18 model. Times represent the total time required to produce the dependencies including: POS tagging (if applicable), parsing, and extraction of the CCprocessed Stanford Dependency representation. Benchmarking was done on a dual CPU Intel Xeon E5520. Multithreading was disabled for the Charniak-Johnson parser, in order to obtain a per CPU-core estimate of parsing speed.
In general, all parsers were run in their default out-of-the-box configurations. But, in addition, for the Charniak-Johnson parser, the table above also shows the speed and accuracy trade-offs from varying the amount of search by setting different T values (by default T = 210). The Charniak-Johnson parser allows users to trade off parsing accuracy for speed by adjusting how liberal the system is about expanding edges after the best-first-search has found one complete parse of the sentence: they constrain themselves to only examine Tval/10 times more edges in search of a better parse.
For more information about these parsing accuracy vs. speed trade-offs when generating Stanford Dependencies, see:
Daniel Cer, Marie-Catherine de Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to Stanford Dependencies: Trade-offs between speed and accuracy. In 7th International Conference on Language Resources and Evaluation (LREC 2010). [pdf, bib]
To ask questions about the dependencies, you can use the same lists as for the parser, each
parser-userThis is the best list to post to in order to ask questions, make announcements, or for discussion among parser users. Join the list via this webpage or by emailing
firstname.lastname@example.org. (Leave the subject and message body empty.) You can also look at the list archives.
parser-announceThis list will be used only to announce new parser versions. So it will be very low volume (expect 1-3 message a year). Join the list via this webpage or by emailing
email@example.com. (Leave the subject and message body empty.)
parser-supportThis list goes only to the parser maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using
parser-user. You cannot join
parser-support, but you can mail questions to
Bernard Bou developed a GUI focusing on the typed dependencies, including an editor: