edu.stanford.nlp.parser.lexparser
Class LexicalizedParser

java.lang.Object
  extended byedu.stanford.nlp.parser.lexparser.LexicalizedParser
All Implemented Interfaces:
Function, Parser, ViterbiParser

public class LexicalizedParser
extends Object
implements ViterbiParser, Function

A reasonably good lexicalized PCFG parser. It does a product-of-experts model of plain PCFG parsing and lexicalized dependency parsing. Or it can do unlexicalized PCFG parsing by using just that component parser. Note that training requires a lot of memory to run. Try -mx1500m. See the package documentation for more details and examples of use. See the main method documentation for details of invoking the parser.

Author:
Dan Klein (original version), Christopher Manning (better features, ParserParams, serialization), Roger Levy (internationalization), Teg Grenager (grammar compaction, etc., tokenization, etc.)

Field Summary
protected  edu.stanford.nlp.parser.lexparser.BiLexPCFGParser bparser
           
protected  TreeTransformer debinarizer
           
protected  edu.stanford.nlp.parser.lexparser.ExhaustiveDependencyParser dparser
           
protected  edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser pparser
           
 
Constructor Summary
LexicalizedParser()
          Construct a new LexicalizedParser object from a previously serialized grammar read from a property edu.stanford.nlp.SerializedLexicalizedParser, or a default file location.
LexicalizedParser(ObjectInputStream in)
          Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream.
LexicalizedParser(ObjectInputStream in, int maxLeng)
          Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream.
LexicalizedParser(ParserData pd)
          Construct a new LexicalizedParser object from a previously assembled grammar.
LexicalizedParser(String parserFileOrUrl)
          Construct a new LexicalizedParser.
LexicalizedParser(String parserFileOrUrl, boolean isTextGrammar)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng)
           
LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng, GrammarCompactor compactor)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, TreebankLangParserParams tlpParams)
           
LexicalizedParser(String treebankPath, FileFilter filt, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
          Construct a new LexicalizedParser by training from treebank files.
LexicalizedParser(String serializedFileOrUrl, int maxLeng)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
          Construct a new LexicalizedParser by training from treebank files.
 
Method Summary
 Object apply(Object in)
          Converts a Sentence/List into a Tree.
 Tree getBestDependencyParse()
           
 Tree getBestParse()
          Return the best parse of the sentence most recently parsed.
 Tree getBestPCFGParse()
           
 Tree getBestPCFGParse(boolean stripSubcategories)
           
protected static ParserData getParserDataFromSerializedFile(String serializedFileOrUrl)
           
protected static ParserData getParserDataFromTextFile(String textFileOrUrl)
           
protected  ParserData getParserDataFromTreebank(String treebankPath, FileFilter filt, GrammarCompactor compactor)
           
 double getPCFGScore(String goalStr)
           
static void main(String[] args)
          A simple main program for using the parser.
protected  void makeParsers(ParserData pd)
           
 boolean parse(List sentence)
          Parse a sentence represented as a List.
 boolean parse(Sentence sentence)
          Parse a Sentence.
 boolean parse(Sentence sentence, String goal)
          Parse a Sentence.
 ParserData parserData()
           
 void setTreebankLangParserParams(TreebankLangParserParams tlpp)
          Allows the caller to specify a TreebankLangParserParams to use.
 void testGrammarCoverage(Treebank testTreebank)
           
 double testOnTreebank(Treebank testTreebank)
          Evaluates the performance of the parser on a test treebank.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

pparser

protected edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser pparser

dparser

protected edu.stanford.nlp.parser.lexparser.ExhaustiveDependencyParser dparser

bparser

protected edu.stanford.nlp.parser.lexparser.BiLexPCFGParser bparser

debinarizer

protected TreeTransformer debinarizer
Constructor Detail

LexicalizedParser

public LexicalizedParser()
Construct a new LexicalizedParser object from a previously serialized grammar read from a property edu.stanford.nlp.SerializedLexicalizedParser, or a default file location.


LexicalizedParser

public LexicalizedParser(String parserFileOrUrl)
Construct a new LexicalizedParser. This loads a grammar that was previously assembled and stored.

Throws:
IllegalArgumentException - If parser data cannot be loaded

LexicalizedParser

public LexicalizedParser(String parserFileOrUrl,
                         boolean isTextGrammar)
Construct a new LexicalizedParser. This loads a grammar that was previously assembled and stored.

Throws:
IllegalArgumentException - If parser data cannot be loaded

LexicalizedParser

public LexicalizedParser(String serializedFileOrUrl,
                         int maxLeng)
Construct a new LexicalizedParser. This loads a grammar that was previously assembled and stored.

Parameters:
maxLeng - Maximum sentence length that you want the parser to be able to parse (this effects memory consumption)
Throws:
IllegalArgumentException - If parser data cannot be loaded

LexicalizedParser

public LexicalizedParser(ParserData pd)
Construct a new LexicalizedParser object from a previously assembled grammar.

Parameters:
pd - A ParserData object (not null)

LexicalizedParser

public LexicalizedParser(ObjectInputStream in)
                  throws Exception
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. One (ParserData) object is read from the stream. It isn't closed.

Parameters:
in - The ObjectInputStream

LexicalizedParser

public LexicalizedParser(ObjectInputStream in,
                         int maxLeng)
                  throws Exception
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. One (ParserData) object is read from the stream. It isn't closed.

Parameters:
in - The ObjectInputStream
maxLeng - Maximum sentence length that you want the parser to be able to parse (this effects memory consumption)

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         TreebankLangParserParams tlpParams,
                         GrammarCompactor compactor)
Construct a new LexicalizedParser by training from treebank files.

Parameters:
treebankPath - a String value
filt - a FileFilter value. This may be null if no filtering of selected files is needed.

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         TreebankLangParserParams tlpParams)

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         TreebankLangParserParams tlpParams,
                         GrammarCompactor compactor)
Construct a new LexicalizedParser by training from treebank files.

Parameters:
treebankPath - a String value

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         int maxLeng,
                         TreebankLangParserParams tlpParams,
                         GrammarCompactor compactor)
Construct a new LexicalizedParser.

Parameters:
treebankPath - a String value
filt - a FileFilter value
maxLeng - The maximum length sentences to be able to parser. A large value for this requires a great deal of memory (and time) for parsing, but allows parsing longer sentences.
tlpParams - The Treebank parameters class for different languages

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         int maxLeng,
                         GrammarCompactor compactor)
Construct a new LexicalizedParser.

Parameters:
treebankPath - a String value
filt - a FileFilter value
maxLeng - The maximum length sentences to be able to parser. A large value for this requires a great deal of memory (and time) for parsing, but allows parsing longer sentences.

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         int maxLeng)
Method Detail

setTreebankLangParserParams

public void setTreebankLangParserParams(TreebankLangParserParams tlpp)
Allows the caller to specify a TreebankLangParserParams to use.

Parameters:
tlpp - The one to use

apply

public Object apply(Object in)
Converts a Sentence/List into a Tree. If it can't be parsed, it is made into a trivial tree in which each word is attached to a start nonterminal.

Specified by:
apply in interface Function
Parameters:
in - The input Sentence/List
Returns:
A Tree that is the parse tree for the sentence. If the parser fails, a new Tree is synthesized which attaches all words to the root.
Throws:
IllegalArgumentException - If argument isn't a List

parse

public boolean parse(Sentence sentence)
Parse a Sentence.

Specified by:
parse in interface Parser
Parameters:
sentence - A Sentence to be parsed
Returns:
true iff it could be parsed

parse

public boolean parse(Sentence sentence,
                     String goal)
Parse a Sentence. This hasn't yet been implemented. At present the goal is ignored.

Specified by:
parse in interface Parser
Parameters:
sentence - A Sentence to be parsed
goal - The category to parse the sentence as (e.g., NP, S)
Returns:
true iff it could be parsed

parse

public boolean parse(List sentence)
Parse a sentence represented as a List.

Parameters:
sentence - The sentence to parse
Returns:
true Iff the sentence was accepted by the grammar
Throws:
UnsupportedOperationException - If the Sentence is too long or otherwise fails for resource reasons

getBestParse

public Tree getBestParse()
Return the best parse of the sentence most recently parsed.

Specified by:
getBestParse in interface ViterbiParser
Returns:
The best tree
Throws:
NoSuchElementException - If no previously successfully parsed sentence

getBestPCFGParse

public Tree getBestPCFGParse()

getBestPCFGParse

public Tree getBestPCFGParse(boolean stripSubcategories)

getPCFGScore

public double getPCFGScore(String goalStr)

getBestDependencyParse

public Tree getBestDependencyParse()

parserData

public ParserData parserData()

getParserDataFromTextFile

protected static ParserData getParserDataFromTextFile(String textFileOrUrl)

getParserDataFromSerializedFile

protected static ParserData getParserDataFromSerializedFile(String serializedFileOrUrl)

getParserDataFromTreebank

protected final ParserData getParserDataFromTreebank(String treebankPath,
                                                     FileFilter filt,
                                                     GrammarCompactor compactor)

makeParsers

protected final void makeParsers(ParserData pd)

testGrammarCoverage

public void testGrammarCoverage(Treebank testTreebank)

testOnTreebank

public double testOnTreebank(Treebank testTreebank)
Evaluates the performance of the parser on a test treebank. Note that this routine prints material to stdout and stderr for each tree that it parses.

Parameters:
testTreebank - The Treebank to test the parser on.
Returns:
The labeled constituent F1 score (as a percentage).

main

public static void main(String[] args)
A simple main program for using the parser. This provides three modes of usage: one for building and serializing a parser from treebank data, one for parsing sentences from a file or URL containing a serialized or text grammar parser, and a third (mainly for parser quality testing) for training and testing a parser all in one go.

Usages:
java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -train trainFilesPath start stop serializedGrammarFilename

java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -validate trainFilesPath start stop -treebank testFilePath start stop

java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] serializedGrammarPath filename+

java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] serializedGrammarPath -treebank testFilePath start stop

If the serializedGrammarPath ends in .gz, then the grammar is written and read as a compressed file (GZip). If the serializedGrammarPath is a URL, starting with http://, then the parser is read from the URL. By default the parser will be written as a serialized Java object file; if desired, the file format can be specified with the following alternate usage:

java edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -train trainFilesPath [start stop] [-saveToSerializedFile grammarPath | -saveToTextFile grammarPath]

If no files are supplied in the third usage, then a hardwired sentence is parsed. All final arguments are passed to FactoredParser.

In the same position as the verbose flag (-v), many other options can be specified. The most useful to an end user are:

See also the package documentation for more details and examples of use.

Parameters:
args - Command line arguments, as above


Stanford NLP Group