|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.sequences.CoNLLDocumentReaderAndWriter
public class CoNLLDocumentReaderAndWriter
DocumentReader for the original CoNLL 03 format. In this format, there is one word per line, with extra attributes of a word (POS tag, chunk, etc.) in other space or tab separated columns, where leading and trailing whitespace on the line are ignored. Sentences are supposedly separated by a blank line (one with no non-whitespace characters), but where blank lines occur is in practice often fairly random. In particular, sometimes entities span blank lines. Nevertheless, in this class, like in our original CoNLL system, these blank lines are preserved as a special BOUNDARY token and detected and exploited by some features. The text is divided into documents at each '-DOCSTART-' token, which is seen as a special token, which is also preserved. The reader can read data in any of the IOB/IOE/etc. formats and output tokens in any other, based on the entitySubclassification flag.
This reader is specifically for replicating CoNLL systems. For normal use, you should use the saner ColumnDocumentReaderAndWriter.
Field Summary | |
---|---|
static String |
BOUNDARY
|
static String |
OTHER
|
Constructor Summary | |
---|---|
CoNLLDocumentReaderAndWriter()
|
Method Summary | |
---|---|
Iterator<List<CoreLabel>> |
getIterator(Reader r)
Return an iterator over the contents read from r. |
void |
init(SeqClassifierFlags flags)
This will be called immediately after construction. |
static void |
main(String[] args)
Count some stats on what occurs in a file. |
void |
printAnswers(List<CoreLabel> doc,
PrintWriter out)
Write a standard CoNLL format output file. |
String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String BOUNDARY
public static final String OTHER
Constructor Detail |
---|
public CoNLLDocumentReaderAndWriter()
Method Detail |
---|
public void init(SeqClassifierFlags flags)
DocumentReaderAndWriter
init
in interface DocumentReaderAndWriter<CoreLabel>
flags
- Flags specifying behaviorpublic String toString()
toString
in class Object
public Iterator<List<CoreLabel>> getIterator(Reader r)
IteratorFromReaderFactory
getIterator
in interface IteratorFromReaderFactory<List<CoreLabel>>
r
- Where to read objects from
public void printAnswers(List<CoreLabel> doc, PrintWriter out)
printAnswers
in interface DocumentReaderAndWriter<CoreLabel>
doc
- The document: A List of CoreLabelout
- Where to send the answers topublic static void main(String[] args) throws IOException, ClassNotFoundException
IOException
ClassNotFoundException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |