AnCoraProcessor (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.international.spanish.pipeline.AnCoraProcessor

```
public class AnCoraProcessor
extends java.lang.Object
```
A tool which accepts raw AnCora-3.0 Spanish XML files and produces normalized / pre-processed PTB-style treebanks for use with CoreNLP tools. This is a substitute for an awkward and complicated string of command-line invocations. The produced corpus is the standard treebank which has been used to train the CoreNLP Spanish models. The preprocessing steps performed here include: - Expansion and automatic tagging of multi-word tokens (see MultiWordPreprocessor, SpanishTreeNormalizer.normalizeForMultiWord(Tree, TreeFactory) - Heuristic parsing of expanded multi-word tokens (see MultiWordTreeExpander - Splitting of elided forms (al, del, conmigo, etc.) and clitic pronouns from verb forms (see SpanishTreeNormalizer.expandElisions(Tree), SpanishTreeNormalizer.expandCliticPronouns(Tree) - Miscellaneous cleanup of parse trees, spelling fixes, parsing error corrections (see SpanishTreeNormalizer) Apart from raw corpus data, this processor depends upon unigram part-of-speech tag data. If not provided explicitly to the processor, the data will be collected from the given files. (You can pre-compute POS data from AnCora XML using AnCoraPOSStats.) For invocation options, execute the class with no arguments.

Author:

Jon Gauthier

Field Summary

Fields
Modifier and Type Field and Description

static java.util.HashSet<java.lang.String> auxTagConversion

static java.util.HashSet<java.lang.String> potentialAUXWords

Fields
Modifier and Type	Field and Description
`static java.util.HashSet<java.lang.String>`	`auxTagConversion`
`static java.util.HashSet<java.lang.String>`	`potentialAUXWords`

Constructor Summary

Constructors
Constructor and Description

AnCoraProcessor(java.util.List<java.io.File> inputFiles, java.util.Properties options)

Constructors
Constructor and Description
`AnCoraProcessor(java.util.List<java.io.File> inputFiles, java.util.Properties options)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`convertTreeTagsToUD(Tree tree)`
`static void`	`main(java.lang.String[] args)`
`java.util.List<Tree>`	`process()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

auxTagConversion

public static java.util.HashSet<java.lang.String> auxTagConversion

potentialAUXWords

public static java.util.HashSet<java.lang.String> potentialAUXWords

Constructor Detail

AnCoraProcessor

public AnCoraProcessor(java.util.List<java.io.File> inputFiles,
                       java.util.Properties options)
                throws java.io.IOException,
                       java.lang.ClassNotFoundException

Throws:: java.io.IOException; java.lang.ClassNotFoundException

Method Detail

process

public java.util.List<Tree> process()
                             throws java.lang.InterruptedException,
                                    java.io.IOException,
                                    java.util.concurrent.ExecutionException

Throws:: java.lang.InterruptedException; java.io.IOException; java.util.concurrent.ExecutionException

convertTreeTagsToUD

public static void convertTreeTagsToUD(Tree tree)

main

public static void main(java.lang.String[] args)
                 throws java.lang.InterruptedException,
                        java.io.IOException,
                        java.util.concurrent.ExecutionException,
                        java.lang.ClassNotFoundException

Throws:: java.lang.InterruptedException; java.io.IOException; java.util.concurrent.ExecutionException; java.lang.ClassNotFoundException

Class AnCoraProcessor

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

auxTagConversion

potentialAUXWords

Constructor Detail

AnCoraProcessor

Method Detail

process

convertTreeTagsToUD

main