|
|
Exception in thread "main"
java.lang.NoClassDefFoundError:edu/stanford/nlp/tagger/maxent/MaxentTagger?
On Unix/Linux (or command-line Mac OS X), use GNU tar, if you're not already. (If you're using
Linux, you're almost certainly using GNU tar.) For some reason we don't
understand, it doesn't seem to unpack with classic Unix tar. Make sure you
specify the -z option if you are not gunzipping it in
advance: tar -xzf stanford-ner-2008-05-07.tar.gz.
On Windows, it unpacks fine with most common tools, such as WinZip or 7-Zip. The latter is open source. (As of Sep 2007, WinRAR doesn't work: it apparently does not handle tar files correctly.)
On the Mac, just double-click it to unpack. The default unarchiver (BOMArchiveHelper) works fine.
If it won't unpack, you normally have either a corrupted download (try downloading it again) or there is some configuration error on your system, which we can't help with.
Exception in thread "main"
java.lang.NoClassDefFoundError:edu/stanford/nlp/tagger/maxent/MaxentTagger?
This means your Java CLASSPATH isn't set correctly, so the tagger (in
stanford-tagger.jar) isn't being found. See the examples in the
README.txt file for how to set the classpath with
the -cp or -classpath option.
See, e.g.,
http://en.wikipedia.org/wiki/Classpath_(Java)
for general discussion of the Java classpath.
For English (only), you can do this using the included Morphology class.
However, unlike for the Stanford parser, there is at present no support
for doing this automatically using options of the command-line version
of the tagger. You'd have to do it using code you write.
You're probably using the tagString() method.
Unfortunately, it does use increasing memory in this version. That
method may well not be what you want anyway. It assumes that the input is
correctly tokenized according to the conventions of the tagger training
corpus. For the English models we use derived from the Penn Treebank,
this means things like separating off contractions of "be" and "n't",
rendering parentheses as -LRB-, -RRB-, etc. If you don't do this
correctly, then accuracy will suffer.
(For no very good reason) in the 2008-09-28 distribution, the
tagSentence method is set up to do tagging by using a beam search,
whereas the main method of MaxentTagger and the tagSentence(Sentence)
method called in TaggerDemo.java call a different Viterbi search routine
to do the part-of-speech tagging. There seem to be problems with the
former, and so you should use tagSentence().
If you have Strings which you are happy with the tokenization of, you
can convert to using tagSentence() easily: rather than calling
tagString(String) you could use the line:
String taggedLine =
MaxentTagger.tagSentence(Sentence.toSentence(line.split("\\s+"\))).toString(false);
You can discuss other topics with Stanford NER developers and users by
joining
the java-nlp-user mailing list
(via a webpage). Or you can send other questions and feedback to
java-nlp-support@lists.stanford.edu.
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · ScalaNLP · machines · Wiki |
Site design by Bill MacCartney |