DependencyParser (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.parser.nndep.DependencyParser

```
public class DependencyParser
extends Object
```
This class defines a transition-based dependency parser which makes use of a classifier powered by a neural network. The neural network accepts distributed representation inputs: dense, continuous representations of words, their part of speech tags, and the labels which connect words in a partial dependency parse.
This is an implementation of the method described in
Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser Using Neural Networks. In EMNLP 2014.

New models can be trained from the command line; see main(java.lang.String[]) for details on training options. This parser will also output CoNLL-X format predictions; again see main(java.lang.String[]) for available options.
This parser can also be used programmatically. The easiest way to prepare the parser with a pre-trained model is to call loadFromModelFile(String). Then call predict(edu.stanford.nlp.util.CoreMap) on the returned parser instance in order to get new parses.

Author:

Danqi Chen (danqi@cs.stanford.edu), Jon Gauthier

Field Summary

Fields
Modifier and Type Field and Description

static String DEFAULT_MODEL

Fields
Modifier and Type	Field and Description
`static String`	`DEFAULT_MODEL`

Constructor Summary

Constructors
Constructor and Description

DependencyParser(Properties properties)

Constructors
Constructor and Description
`DependencyParser(Properties properties)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Dataset`	`genTrainExamples(List<CoreMap> sents, List<edu.stanford.nlp.parser.nndep.DependencyTree> trees)`
`List<Integer>`	`getFeatures(Configuration c)`
`int`	`getLabelID(String s)`
`int`	`getPosID(String s)`
`int`	`getWordID(String s)` Get an integer ID for the given word.
`static DependencyParser`	`loadFromModelFile(String modelFile)` Convenience method; see `loadFromModelFile(String, java.util.Properties)`.
`static DependencyParser`	`loadFromModelFile(String modelFile, Properties extraProperties)` Load a saved parser model.
`void`	`loadModelFile(String modelFile)` Load a parser model file, printing out some messages about the grammar in the file.
`static void`	`main(String[] args)` A main program for training, testing and using the parser.
`GrammaticalStructure`	`predict(CoreMap sentence)` Determine the dependency parse of the given sentence using the loaded model.
`GrammaticalStructure`	`predict(List<? extends HasWord> sentence)` Convenience method for `predict(edu.stanford.nlp.util.CoreMap)`.
`double`	`testCoNLL(String testFile, String outFile)` Run the parser in the modelFile on a testFile and perhaps save output.
`void`	`train(String trainFile, String modelFile)`
`void`	`train(String trainFile, String devFile, String modelFile)`
`void`	`train(String trainFile, String devFile, String modelFile, String embedFile)` Train a new dependency parser model.
`void`	`writeModelFile(String modelFile)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- DEFAULT_MODEL
```
public static final String DEFAULT_MODEL
```
  See Also:
  
  Constant Field Values

Constructor Detail

DependencyParser

public DependencyParser(Properties properties)

Method Detail

getWordID
```
public int getWordID(String s)
```
Get an integer ID for the given word. This ID can be used to index into the embeddings embeddings.

Returns:

An ID for the given word, or an ID referring to a generic "unknown" word if the word is unknown

getPosID
```
public int getPosID(String s)
```

getLabelID
```
public int getLabelID(String s)
```

getFeatures

public List<Integer> getFeatures(Configuration c)

genTrainExamples

public Dataset genTrainExamples(List<CoreMap> sents,
                                List<edu.stanford.nlp.parser.nndep.DependencyTree> trees)

writeModelFile

public void writeModelFile(String modelFile)

loadFromModelFile
```
public static DependencyParser loadFromModelFile(String modelFile)
```
Convenience method; see loadFromModelFile(String, java.util.Properties).

See Also:

loadFromModelFile(String, java.util.Properties)

loadFromModelFile
```
public static DependencyParser loadFromModelFile(String modelFile,
                                                 Properties extraProperties)
```
Load a saved parser model.

Parameters:

modelFile - Path to serialized model (may be GZipped)

extraProperties - Extra test-time properties not already associated with model (may be null)

Returns:

Loaded and initialized (see initialize(boolean) model

loadModelFile
```
public void loadModelFile(String modelFile)
```
Load a parser model file, printing out some messages about the grammar in the file.

Parameters:

modelFile - The file (classpath resource, etc.) to load the model from.

train
```
public void train(String trainFile,
                  String devFile,
                  String modelFile,
                  String embedFile)
```
Train a new dependency parser model.

Parameters:

trainFile - Training data

devFile - Development data (used for regular UAS evaluation of model)

modelFile - String to which model should be saved

embedFile - File containing word embeddings for words used in training corpus

train

public void train(String trainFile,
                  String devFile,
                  String modelFile)

See Also:: train(String, String, String, String)

train

public void train(String trainFile,
                  String modelFile)

See Also:: train(String, String, String, String)

predict
```
public GrammaticalStructure predict(CoreMap sentence)
```
Determine the dependency parse of the given sentence using the loaded model. You must first load a parser before calling this method.

Throws:

IllegalStateException - If parser has not yet been loaded and initialized (see initialize(boolean)

predict
```
public GrammaticalStructure predict(List<? extends HasWord> sentence)
```
Convenience method for predict(edu.stanford.nlp.util.CoreMap). The tokens of the provided sentence must also have tag annotations (the parser requires part-of-speech tags).

See Also:

predict(edu.stanford.nlp.util.CoreMap)

testCoNLL
```
public double testCoNLL(String testFile,
                        String outFile)
```
Run the parser in the modelFile on a testFile and perhaps save output.

Parameters:

testFile - File to parse. In CoNLL-X format. Assumed to have gold answers included.

outFile - File to write results to in CoNLL-X format. If null, no output is written

Returns:

The LAS score on the dataset

main

public static void main(String[] args)

A main program for training, testing and using the parser.

You can use this program to train new parsers from treebank data, evaluate on test treebank data, or parse raw text input.

Sample usages:

Train a parser with CoNLL treebank data: java edu.stanford.nlp.parser.nndep.DependencyParser -trainFile trainPath -devFile devPath -embedFile wordEmbeddingFile -embeddingSize wordEmbeddingDimensionality -model modelOutputFile.txt.gz
Parse raw text from a file: java edu.stanford.nlp.parser.nndep.DependencyParser -model modelOutputFile.txt.gz -textFile rawTextToParse -outFile dependenciesOutputFile.txt
Parse raw text from standard input, writing to standard output: java edu.stanford.nlp.parser.nndep.DependencyParser -model modelOutputFile.txt.gz -textFile - -outFile -

See below for more information on all of these training / test options and more. Input / output options:

Optional (highly recommended!)

Option	Required for training	Required for testing / parsing	Description
`‑devFile`	Optional	No	Path to a development-set treebank in CoNLL-X format. If provided, the
`‑embedFile`		No	A word embedding file, containing distributed representations of English words. Each line of the provided file should contain a single word followed by the elements of the corresponding word embedding (space-delimited). It is not absolutely necessary that all words in the treebank be covered by this embedding file, though the parser's performance will generally improve if you are able to provide better embeddings for more words.
`‑model`	Yes	Yes	Path to a model file. If the path ends in `.gz`, the model will be read as a Gzipped model file. During training, we write to this path; at test time we read a pre-trained model from this path.
`‑textFile`	No	Yes (or `testFile`)	Path to a plaintext file containing sentences to be parsed.
`‑testFile`	No	Yes (or `textFile`)	Path to a test-set treebank in CoNLL-X format for final evaluation of the parser.
`‑trainFile`	Yes	No	Path to a training treebank in CoNLL-X format

Training options:

Option	Default	Description
`‑adaAlpha`	0.01	Global learning rate for AdaGrad training
`‑adaEps`	1e-6	Epsilon value added to the denominator of AdaGrad update expression for numerical stability
`‑batchSize`	10000	Size of mini-batch used for training
`‑clearGradientsPerIter`	0	Clear AdaGrad gradient histories every n iterations. If zero, no gradient clearing is performed.
`‑dropProb`	0.5	Dropout probability. For each training example we randomly choose some amount of units to disable in the neural network classifier. This parameter controls the proportion of units "dropped out."
`‑embeddingSize`	50	Dimensionality of word embeddings provided
`‑evalPerIter`	100	Run full UAS (unlabeled attachment score) evaluation every time we finish this number of iterations. (Only valid if a development treebank is provided with `‑devFile`.)
`‑hiddenSize`	200	Dimensionality of hidden layer in neural network classifier
`‑initRange`	0.01	Bounds of range within which weight matrix elements should be initialized. Each element is drawn from a uniform distribution over the range `[-initRange, initRange]`.
`‑maxIter`	20000	Number of training iterations to complete before stopping and saving the final model.
`‑numPreComputed`	100000	The parser pre-computes hidden-layer unit activations for particular inputs words at both training and testing time in order to speed up feedforward computation in the neural network. This parameter determines how many words for which we should compute hidden-layer activations.
`‑regParameter`	1e-8	Regularization parameter for training
`‑saveIntermediate`	`true`	If `true`, continually save the model version which gets the highest UAS value on the dev set. (Only valid if a development treebank is provided with `‑devFile`.)
`‑trainingThreads`	1	Number of threads to use during training. Note that depending on training batch size, it may be unwise to simply choose the maximum amount of threads for your machine. On our 16-core test machines: a batch size of 10,000 runs fastest with around 6 threads; a batch size of 100,000 runs best with around 10 threads.
`‑wordCutOff`	1	The parser can optionally ignore rare words by simply choosing an arbitrary "unknown" feature representation for words that appear with frequency less than n in the corpus. This n is controlled by the `wordCutOff` parameter.

Runtime parsing options (for parsing raw text with -textFile):

Option	Default	Description
`‑escaper`	N/A	If provided, use this word-escaper when parsing raw sentences. (Should be a fully-qualified class name like `edu.stanford.nlp.trees.international.arabic.ATBEscaper`.)
`‑sentenceDelimiter`	N/A	If provided, assume that the given `textFile` has already been sentence-split, and that sentences are separated by this delimiter.
`‑tagger.model`	edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger	Path to a part-of-speech tagger to use to pre-tag the raw sentences before parsing.

Class DependencyParser

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_MODEL

Constructor Detail

DependencyParser

Method Detail

getWordID

getPosID

getLabelID

getFeatures

genTrainExamples

writeModelFile

loadFromModelFile

loadFromModelFile

loadModelFile

train

train

train

predict

predict

testCoNLL

main