|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.LexicalizedParser
A reasonably good lexicalized PCFG parser. It does a product-of-experts model of plain PCFG parsing and lexicalized dependency parsing. Or it can do unlexicalized PCFG parsing by using just that component parser. Note that training requires a lot of memory to run. Try -mx1500m. See the package documentation for more details and examples of use. See the main method documentation for details of invoking the parser.
Field Summary | |
protected edu.stanford.nlp.parser.lexparser.BiLexPCFGParser |
bparser
|
protected TreeTransformer |
debinarizer
|
protected edu.stanford.nlp.parser.lexparser.ExhaustiveDependencyParser |
dparser
|
protected edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser |
pparser
|
Constructor Summary | |
LexicalizedParser()
Construct a new LexicalizedParser object from a previously serialized grammar read from a property edu.stanford.nlp.SerializedLexicalizedParser ,
or a default file location. |
|
LexicalizedParser(ObjectInputStream in)
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. |
|
LexicalizedParser(ObjectInputStream in,
int maxLeng)
Construct a new LexicalizedParser object from a previously assembled grammar read from an InputStream. |
|
LexicalizedParser(ParserData pd)
Construct a new LexicalizedParser object from a previously assembled grammar. |
|
LexicalizedParser(String parserFileOrUrl)
Construct a new LexicalizedParser. |
|
LexicalizedParser(String parserFileOrUrl,
boolean isTextGrammar)
Construct a new LexicalizedParser. |
|
LexicalizedParser(String treebankPath,
FileFilter filt,
int maxLeng)
|
|
LexicalizedParser(String treebankPath,
FileFilter filt,
int maxLeng,
GrammarCompactor compactor)
Construct a new LexicalizedParser. |
|
LexicalizedParser(String treebankPath,
FileFilter filt,
int maxLeng,
TreebankLangParserParams tlpParams,
GrammarCompactor compactor)
Construct a new LexicalizedParser. |
|
LexicalizedParser(String treebankPath,
FileFilter filt,
TreebankLangParserParams tlpParams)
|
|
LexicalizedParser(String treebankPath,
FileFilter filt,
TreebankLangParserParams tlpParams,
GrammarCompactor compactor)
Construct a new LexicalizedParser by training from treebank files. |
|
LexicalizedParser(String serializedFileOrUrl,
int maxLeng)
Construct a new LexicalizedParser. |
|
LexicalizedParser(String treebankPath,
TreebankLangParserParams tlpParams,
GrammarCompactor compactor)
Construct a new LexicalizedParser by training from treebank files. |
Method Summary | |
Object |
apply(Object in)
Converts a Sentence/List into a Tree. |
Tree |
getBestDependencyParse()
|
Tree |
getBestParse()
Return the best parse of the sentence most recently parsed. |
Tree |
getBestPCFGParse()
|
Tree |
getBestPCFGParse(boolean stripSubcategories)
|
protected static ParserData |
getParserDataFromSerializedFile(String serializedFileOrUrl)
|
protected static ParserData |
getParserDataFromTextFile(String textFileOrUrl)
|
protected ParserData |
getParserDataFromTreebank(String treebankPath,
FileFilter filt,
GrammarCompactor compactor)
|
double |
getPCFGScore(String goalStr)
|
static void |
main(String[] args)
A simple main program for using the parser. |
protected void |
makeParsers(ParserData pd)
|
boolean |
parse(List sentence)
Parse a sentence represented as a List. |
boolean |
parse(Sentence sentence)
Parse a Sentence. |
boolean |
parse(Sentence sentence,
String goal)
Parse a Sentence. |
ParserData |
parserData()
|
void |
setTreebankLangParserParams(TreebankLangParserParams tlpp)
Allows the caller to specify a TreebankLangParserParams to use. |
void |
testGrammarCoverage(Treebank testTreebank)
|
double |
testOnTreebank(Treebank testTreebank)
Evaluates the performance of the parser on a test treebank. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser pparser
protected edu.stanford.nlp.parser.lexparser.ExhaustiveDependencyParser dparser
protected edu.stanford.nlp.parser.lexparser.BiLexPCFGParser bparser
protected TreeTransformer debinarizer
Constructor Detail |
public LexicalizedParser()
edu.stanford.nlp.SerializedLexicalizedParser
,
or a default file location.
public LexicalizedParser(String parserFileOrUrl)
IllegalArgumentException
- If parser data cannot be loadedpublic LexicalizedParser(String parserFileOrUrl, boolean isTextGrammar)
IllegalArgumentException
- If parser data cannot be loadedpublic LexicalizedParser(String serializedFileOrUrl, int maxLeng)
maxLeng
- Maximum sentence length that you want the parser to be
able to parse (this effects memory consumption)
IllegalArgumentException
- If parser data cannot be loadedpublic LexicalizedParser(ParserData pd)
pd
- A ParserData
object (not null
)public LexicalizedParser(ObjectInputStream in) throws Exception
in
- The ObjectInputStreampublic LexicalizedParser(ObjectInputStream in, int maxLeng) throws Exception
in
- The ObjectInputStreammaxLeng
- Maximum sentence length that you want the parser to be
able to parse (this effects memory consumption)public LexicalizedParser(String treebankPath, FileFilter filt, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
treebankPath
- a String
valuefilt
- a FileFilter
value. This may be
null
if no filtering of selected files is needed.public LexicalizedParser(String treebankPath, FileFilter filt, TreebankLangParserParams tlpParams)
public LexicalizedParser(String treebankPath, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
treebankPath
- a String
valuepublic LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng, TreebankLangParserParams tlpParams, GrammarCompactor compactor)
treebankPath
- a String
valuefilt
- a FileFilter
valuemaxLeng
- The maximum length sentences to be able to parser.
A large value for this requires a great deal of memory (and
time) for parsing, but allows parsing longer sentences.tlpParams
- The Treebank parameters class for different languagespublic LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng, GrammarCompactor compactor)
treebankPath
- a String
valuefilt
- a FileFilter
valuemaxLeng
- The maximum length sentences to be able to parser.
A large value for this requires a great deal of memory (and
time) for parsing, but allows parsing longer sentences.public LexicalizedParser(String treebankPath, FileFilter filt, int maxLeng)
Method Detail |
public void setTreebankLangParserParams(TreebankLangParserParams tlpp)
tlpp
- The one to usepublic Object apply(Object in)
apply
in interface Function
in
- The input Sentence/List
IllegalArgumentException
- If argument isn't a Listpublic boolean parse(Sentence sentence)
parse
in interface Parser
sentence
- A Sentence
to be parsed
public boolean parse(Sentence sentence, String goal)
parse
in interface Parser
sentence
- A Sentence
to be parsedgoal
- The category to parse the sentence as (e.g., NP, S)
public boolean parse(List sentence)
sentence
- The sentence to parse
UnsupportedOperationException
- If the Sentence is too long or
otherwise fails for resource reasonspublic Tree getBestParse()
getBestParse
in interface ViterbiParser
NoSuchElementException
- If no previously successfully parsed
sentencepublic Tree getBestPCFGParse()
public Tree getBestPCFGParse(boolean stripSubcategories)
public double getPCFGScore(String goalStr)
public Tree getBestDependencyParse()
public ParserData parserData()
protected static ParserData getParserDataFromTextFile(String textFileOrUrl)
protected static ParserData getParserDataFromSerializedFile(String serializedFileOrUrl)
protected final ParserData getParserDataFromTreebank(String treebankPath, FileFilter filt, GrammarCompactor compactor)
protected final void makeParsers(ParserData pd)
public void testGrammarCoverage(Treebank testTreebank)
public double testOnTreebank(Treebank testTreebank)
testTreebank
- The Treebank to test the parser on.
public static void main(String[] args)
Usages:
java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -validate
trainFilesPath start stop
-treebank testFilePath start stop
java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser
[-v] serializedGrammarPath filename+
java -mx512m edu.stanford.nlp.parser.lexparser.LexicalizedParser
[-v] serializedGrammarPath -treebank testFilePath start stop
java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v]
-train trainFilesPath start stop serializedGrammarFilename
If the serializedGrammarPath
ends in .gz
,
then the grammar is written and read as a compressed file (GZip).
If the serializedGrammarPath
is a URL, starting with
http://
, then the parser is read from the URL.
By default the parser will be written as a serialized Java object file;
if desired, the file format can be specified with the following alternate usage:
java edu.stanford.nlp.parser.lexparser.LexicalizedParser [-v] -train
trainFilesPath [start stop] [-saveToSerializedFile grammarPath |
-saveToTextFile grammarPath]
If no files are supplied in the third usage, then a hardwired sentence is parsed. All final arguments are passed to FactoredParser.
In the same position as the verbose flag (-v
), many other
options can be specified. The most useful to an end user are:
-tLPP class
Specify a different
TreebankLangParserParams, for when using a different language or
treebank (the default is English Penn Treebank)-encoding charset
Specify the character encoding of the
input files-tokenized
Says that the input is already separated
into whitespace-delimited tokens-tokenizerFactory class
Specifies a
TokenizFromReaderFactory class to be used for tokenization-parseInside regexp
Specifies that parsing should only
be done for tokens inside elements of the indicated XML-style
elements (done as simple pattern matching, rather than XML parsing).
For example, if this is specified as text|doc
, then
the material in text
and doc
elements
would be parsed. Sentences cannot span elements.
-sentences token
Specifies a token that marks sentence
boundaries. Most tokens are interpreted literally, and must be
tokens returned by the tokenizer. A token starting with "<" and
ending with ">" will cause the tokenizer to divide sentences on
either this literal tag or the corresponding end tag, and a value of
"newline" causes sentence breaking on newlines.-tagSeparator char
Specifies to look for tags on words
separated by a reserved character char.-maxLength leng
Specify the longest sentence that
will be parsed (and hence indirectly the amount of memory
needed).-outputTreeFormat style
Choose the style of output
sentences: penn
for prettyprinting as in the Penn
treebank files, or oneline
for printing sentences one
per line.
args
- Command line arguments, as above
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |