edu.stanford.nlp.parser.lexparser
Class LexicalizedParser

java.lang.Object
  |
  +--edu.stanford.nlp.parser.lexparser.LexicalizedParser
All Implemented Interfaces:
Appliable, Parser, ViterbiParser

public class LexicalizedParser
extends Object
implements ViterbiParser, Appliable

A reasonably good lexicalized PCFG parser. It does a product-of-experts model of plain PCFG parsing and lexicalized dependency parsing. Or it can do unlexicalized PCFG parsing by using just that component parser. Note that training requires a lot of memory to run. Try -mx1500m. See the package documentation for more details and examples of use. See the main method documentation for details of invoking the parser.

Author:
Dan Klein, Christopher Manning (expanded the API), Teg Grenager (made small interface changes)

Constructor Summary
LexicalizedParser()
          Construct a new LexicalizedParser object from a previously assembled grammar read from a property edu.stanford.nlp.SerializedLexicalizedParser, or a default place.
LexicalizedParser(edu.stanford.nlp.parser.lexparser.LexicalizedParser.ParserData pd)
          Construct a new LexicalizedParser object from a previously assembled grammar.
LexicalizedParser(ObjectInputStream in)
          Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream.
LexicalizedParser(ObjectInputStream in, int maxLeng)
          Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream.
LexicalizedParser(String serializedFileOrUrl)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng, TreebankLangParserParams tlpParams)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, FileFilter filt, TreebankLangParserParams tlpParams)
          Construct a new LexicalizedParser by training from treebank files.
LexicalizedParser(String serializedFileOrUrl, int maxLeng)
          Construct a new LexicalizedParser.
LexicalizedParser(String treebankPath, TreebankLangParserParams tlpParams)
          Construct a new LexicalizedParser by training from treebank files.
 
Method Summary
 Object apply(Object in)
          Converts a Sentence/List into a Tree.
protected static edu.stanford.nlp.parser.lexparser.LexicalizedParser.ParserData deserializeParser(String filenameOrUrl)
           
 Tree getBestDependencyParse()
           
 Tree getBestParse()
          Return the best parse of the sentence most recently parsed.
 Tree getBestPCFGParse()
           
static void main(String[] args)
          A simple main program for using the parser.
 boolean parse(List sentence)
          Parse a sentence represented as a List.
 boolean parse(Sentence sentence)
          Parse a Sentence.
 boolean parse(Sentence sentence, String goal)
          Parse a Sentence.
 void setTreebankLangParserParams(TreebankLangParserParams tlpp)
          Allows the caller to specify a TreebankLangParserParams to use.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LexicalizedParser

public LexicalizedParser()
Construct a new LexicalizedParser object from a previously assembled grammar read from a property edu.stanford.nlp.SerializedLexicalizedParser, or a default place.


LexicalizedParser

public LexicalizedParser(String serializedFileOrUrl)
Construct a new LexicalizedParser. This loads a grammar that was previously assembled and stored.

Throws:
IllegalArgumentException - If parser data cannot be loaded

LexicalizedParser

public LexicalizedParser(String serializedFileOrUrl,
                         int maxLeng)
Construct a new LexicalizedParser. This loads a grammar that was previously assembled and stored.

Parameters:
maxLeng - Maximum sentence length that you want the parser to be able to parse (this effects memory consumption)
Throws:
IllegalArgumentException - If parser data cannot be loaded

LexicalizedParser

public LexicalizedParser(edu.stanford.nlp.parser.lexparser.LexicalizedParser.ParserData pd)
Construct a new LexicalizedParser object from a previously assembled grammar.

Parameters:
pd - A ParserData object (not null)

LexicalizedParser

public LexicalizedParser(ObjectInputStream in)
                  throws Exception
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. One (ParserData) object is read from the stream. It isn't closed.

Parameters:
in - The ObjectInputStream

LexicalizedParser

public LexicalizedParser(ObjectInputStream in,
                         int maxLeng)
                  throws Exception
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. One (ParserData) object is read from the stream. It isn't closed.

Parameters:
in - The ObjectInputStream
maxLeng - Maximum sentence length that you want the parser to be able to parse (this effects memory consumption)

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         TreebankLangParserParams tlpParams)
Construct a new LexicalizedParser by training from treebank files.

Parameters:
treebankPath - a String value
filt - a FileFilter value. This may be null if no filtering of selected files is needed.

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         TreebankLangParserParams tlpParams)
Construct a new LexicalizedParser by training from treebank files.

Parameters:
treebankPath - a String value

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         int maxLeng,
                         TreebankLangParserParams tlpParams)
Construct a new LexicalizedParser.

Parameters:
treebankPath - a String value
filt - a FileFilter value
maxLeng - The maximum length sentences to be able to parser. A large value for this requires a great deal of memory (and time) for parsing, but allows parsing longer sentences.
tlpParams - The Treebank parameters class for different languages

LexicalizedParser

public LexicalizedParser(String treebankPath,
                         FileFilter filt,
                         int maxLeng)
Construct a new LexicalizedParser.

Parameters:
treebankPath - a String value
filt - a FileFilter value
maxLeng - The maximum length sentences to be able to parser. A large value for this requires a great deal of memory (and time) for parsing, but allows parsing longer sentences.
Method Detail

setTreebankLangParserParams

public void setTreebankLangParserParams(TreebankLangParserParams tlpp)
Allows the caller to specify a TreebankLangParserParams to use.

Parameters:
tlpp - The one to use

apply

public Object apply(Object in)
Converts a Sentence/List into a Tree. If it can't be parsed, it is made into a trivial tree in which each word is attached to a start nonterminal.

Specified by:
apply in interface Appliable
Parameters:
in - The input Sentence/List
Returns:
A Tree that is the parse tree for the sentence. If the parser fails, a new Tree is synthesized which attaches all words to the root.
Throws:
IllegalArgumentException - If argument isn't a List

parse

public boolean parse(Sentence sentence)
Parse a Sentence.

Specified by:
parse in interface Parser
Parameters:
sentence - A Sentence to be parsed
Returns:
true iff it could be parsed

parse

public boolean parse(Sentence sentence,
                     String goal)
Parse a Sentence. This hasn't yet been implemented. At present the goal is ignored.

Specified by:
parse in interface Parser
Parameters:
sentence - A Sentence to be parsed
goal - The category to parse the sentence as (e.g., NP, S)
Returns:
true iff it could be parsed

parse

public boolean parse(List sentence)
Parse a sentence represented as a List.

Parameters:
sentence - The sentence to parse
Returns:
true Iff the sentence was accepted by the grammar
Throws:
UnsupportedOperationException - If the Sentence is too long or otherwise fails for resource reasons

getBestParse

public Tree getBestParse()
Return the best parse of the sentence most recently parsed.

Specified by:
getBestParse in interface ViterbiParser
Returns:
The best tree
Throws:
NoSuchElementException - If no previously successfully parsed sentence

getBestPCFGParse

public Tree getBestPCFGParse()

getBestDependencyParse

public Tree getBestDependencyParse()

deserializeParser

protected static edu.stanford.nlp.parser.lexparser.LexicalizedParser.ParserData deserializeParser(String filenameOrUrl)

main

public static void main(String[] args)
A simple main program for using the parser. This provides three modes of usage: one for building and serializing a parser from treebank data, one for parsing sentences from a file or URL based on a serialized parser, and a third (mainly for parser quality testing) for training and testing a parser all in one go.

Usages:
java edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -train trainFilesPath [start stop] serializedParserFilename

java edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -validate trainFilesPath [start stop -treebank [testFilePath [start stop]]]

java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] serializedParserFilename filename+

java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] serializedParserFilename -treebank testFilePath [start stop]

If the serializedParserFilename ends in .gz, then the serialization data is written and read compressed (GZip). The argument filename may be a URL, starting with http://. If no files are supplied in the third usage, then a hardwired sentence is parsed. All final arguments are passed to FactoredParser.

In the same position as the verbose flag (-v), many other options can be specified. The most useful to an end user are:

Parameters:
args - Command line arguments, as above


Stanford NLP Group