Software > Stanford Parser > Neural Network Dependency Parser


A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads. The figure below shows a dependency parse of a short sentence. The arrow from the word moving to the word faster indicates that faster modifies moving, and the label advmod assigned to the arrow describes the exact nature of the dependency.

We have built a super-fast transition-based parser which produces typed dependency parses of natural language sentences. The parser is powered by a neural network which accepts word embedding inputs, as described in the paper:

Danqi Chen and Christopher Manning. 2014. A Fast and Accurate Dependency Parser Using Neural Networks. In Proceedings of EMNLP 2014.

This parser supports English (with Universal Dependencies, Stanford Dependencies and CoNLL Dependencies) and Chinese (with CoNLL Dependencies). Future versions of the software will support other languages.

How transition-based parsing works

For a quick introduction to the standard approach to transition-based dependency parsing, see Joakim Nivre's EACL 2014 tutorial.

This parser builds a parse by performing a linear-time scan over the words of a sentence. At every step it maintains a partial parse, a stack of words which are currently being processed, and a buffer of words yet to be processed.

The parser continues to apply transitions to its state until its buffer is empty and the dependency graph is completed.

The initial state is to have all of the words in order on the buffer, with a single dummy ROOT node on the stack. The following transitions can be applied:

  • LEFT-ARC: marks the second item on the stack as a dependent of the first item, and removes the second item from the stack (if the stack contains at least two items).
  • RIGHT-ARC: marks the first item on the stack as a dependent of the second item, and removes the first item from the stack (if the stack contains at least two items).
  • SHIFT: removes a word from the buffer and pushes it onto the stack (if the buffer is not empty).

With just these three types of transitions, a parser can generate any projective dependency parse. Note that for a typed dependency parser, with each transition we must also specify the type of the relationship between the head and dependent being described.

The parser decides among transitions at each state using a neural network classifier. Distributed representations (dense, continuous vector representations) of the parser's current state are provided as inputs to this classifier, which then chooses among the possible transitions to make next. These representations describe various features of the current stack and buffer contents in the parser state.

The classifier which powers the parser is trained using an oracle. This oracle takes each sentence in the training data and produces many training examples indicating which transition should be taken at each state to reach the correct final parse. The neural network is trained on these examples using adaptive gradient descent (AdaGrad) with hidden unit dropout.

Obtaining the software

You may download either of the following packages:


Trained models for use with this parser are included in either of the packages. The list of models currently distributed is:
edu/stanford/nlp/models/parser/nndep/english_UD.gz (default, English, Universal Dependencies)
edu/stanford/nlp/models/parser/nndep/PTB_Stanford_params.txt.gz (English, Stanford Dependencies)
edu/stanford/nlp/models/parser/nndep/PTB_CoNLL_params.txt.gz (English, CoNLL Dependencies)
edu/stanford/nlp/models/parser/nndep/CTB_CoNLL_params.txt.gz (Chinese, CoNLL Dependencies)
Note that these models were trained with an earlier Matlab version of the code, and your results training with the Java code may be slightly worse.


Command-line interface

The dependency parser can be run as part of the larger CoreNLP pipeline, or run directly (external to the pipeline).

Using the Stanford CoreNLP pipeline

This parser is integrated into Stanford CoreNLP as a new annotator.

If you want to use the transition-based parser from the command line, invoke StanfordCoreNLP with the depparse annotator. This annotator has dependencies on the tokenize, ssplit, and pos annotators. An example invocation follows (assuming CoreNLP is on your classpath):

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,depparse -file <INPUT_FILE>

Direct access (with Stanford Parser or CoreNLP)

It is also possible to access the parser directly in the Stanford Parser or Stanford CoreNLP packages. With direct access to the parser, you can train new models, evaluate models with test treebanks, or parse raw sentences.

The main program to use is the class edu.stanford.nlp.parser.nndep.DependencyParser. The Javadoc for this class' main method describes all possible options in details. Some usage examples follow:

  • Parse raw text from a file:
    java edu.stanford.nlp.parser.nndep.DependencyParser -model modelOutputFile.txt.gz -textFile rawTextToParse -outFile dependenciesOutputFile.txt
  • Parse raw text from standard input, writing to standard output:
    java edu.stanford.nlp.parser.nndep.DependencyParser -model modelOutputFile.txt.gz -textFile - -outFile -

Programmatic access

Included demo

It's also possible to use this parser directly in your own Java code. There is an DependencyParserDemo example class in the package edu.stanford.nlp.parser.nndep.demo, included in the source of the Stanford Parser and the source of CoreNLP.

Java API

The parser exposes an API for both training and testing. You can find more information in our Javadoc.

Training your own parser

You can train a new dependency parser using your own data in the CoNLL-X data format. (Many dependency treebanks are provided in this format by default; even if not, conversion is often trivial.)

Basic guidelines

To train a new English model, you need the following pieces of data:

  • A dependency treebank, split into training, development, and test segments. (Most treebanks come with a predetermined split.)
  • A word embedding file, containing distributed representations of English words. It is not absolutely necessary that all words in the treebank be covered by this embedding file, though the parser's performance will generally improve if you are able to provide better embeddings for more words.

    This word embedding file is only used for training. The parser will build its own improved embeddings and save them as part of the learned model.

To start training with the data described above, run this command with the parser on your classpath:

java edu.stanford.nlp.parser.nndep.DependencyParser -trainFile <train path> -devFile <dev path> -embedFile <word embedding file> -embeddingSize <word embedding dimensionality> -model nndep.model.txt.gz

On the NLP machines, training data is available in /u/nlp/data/depparser/nn/data:

java edu.stanford.nlp.parser.nndep.DependencyParser \
    -trainFile /u/nlp/data/depparser/nn/data/dependency_treebanks/PTB_Stanford/train.conll \
    -devFile /u/nlp/data/depparser/nn/data/dependency_treebanks/PTB_Stanford/dev.conll \
    -embedFile /u/nlp/data/depparser/nn/data/embeddings/en-cw.txt -embeddingSize 50 \
    -model nndep.model.txt.gz

Training models for other languages

To train the parser for languages other than English, you need the data as described in the previous section, along with a TreebankLanguagePack describing the particularities of your treebank and the language it contains. (The Stanford Parser package may already contain a TLP for your language of choice: check the package

Note that at test time, a language appropriate tagger will also be necessary.

For example, here is a command used to train a Chinese model. The only difference from the English case (apart from the fact that we changed datasets) is that we also provide a different TreebankLanguagePack class with the -tlp option.

java edu.stanford.nlp.parser.nndep.DependencyParser -tlp -trainFile chinese/train.conll -devFile chinese/dev.conll -embedFile chinese/embeddings.txt -embeddingSize 50 -model nndep.chinese.model.txt.gz

The only complicated part here is the TreebankLanguagePack, which is a Java class you need to provide. It's not hard to write. It's only used for a couple of things: A default character encoding, a list of punctuation POS tags and sentence final punctuation words, and to specify a tokenizer (which you might also need to write). Some of these, like the tokenizer, are only needed for running the parser on raw text, and you can train and test on CoNLL files without one. Getting started, if your language uses the Latin alphabet, you can probably get away with using the default English TreebankLanguagePack, PennTreebankLanguagePack.

Additional training options

‑adaAlpha0.01Global learning rate for AdaGrad training.
‑adaEps1e-6Epsilon value added to the denominator of AdaGrad update expression for numerical stability.
‑batchSize10000Size of mini-batch used for training.
‑dropProb0.5Dropout probability. For each training example we randomly choose some amount of units to disable in the neural network classifier. This parameter controls the proportion of units "dropped out."
‑embeddingSize50Dimensionality of word embeddings provided.
‑evalPerIter100Run full UAS (unlabeled attachment score) evaluation on the development set every time we finish this number of iterations.
‑hiddenSize200Dimensionality of hidden layer in neural network classifier.
‑initRange0.01Bounds of range within which weight matrix elements should be initialized. Each element is drawn from a uniform distribution over the range [-initRange, initRange].
‑maxIter20000Number of training iterations to complete before stopping and saving the final model.
‑numPreComputed100000The parser pre-computes hidden-layer unit activations for particular inputs words at both training and testing time in order to speed up feedforward computation in the neural network. This parameter determines how many words for which we should compute hidden-layer activations.
‑regParameter1e-8Regularization parameter for training.
‑trainingThreads1Number of threads to use during training. Note that depending on training batch size, it may be unwise to simply choose the maximum amount of threads for your machine. On our 16-core test machines: a batch size of 10,000 runs fastest with around 6 threads; a batch size of 100,000 runs best with around 10 threads.
‑wordCutOff1The parser can optionally ignore rare words by simply choosing an arbitrary "unknown" feature representation for words that appear with frequency less than n in the corpus. This n is controlled by the wordCutOff parameter.


The table below describes this parser's performance on the Penn Treebank, converted to dependencies using Stanford Dependencies. The part-of-speech tags used as input for training and testing were generated by the Stanford POS Tagger (using the bidirectional5words model).

(1700 sentences)
(2416 sentences)