|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.international.arabic.IBMArabicEscaper
public class IBMArabicEscaper
This escaper is intended for use on flat input to be parsed by LexicalizedParser
.
It performs these functions functions:
ArabicTreeNormalizer
This class supports both Buckwalter and UTF-8 encoding.
IMPORTANT: This class must implement Function
in order to run with the parser.
, List
Constructor Summary | |
---|---|
IBMArabicEscaper()
|
|
IBMArabicEscaper(boolean annoteAndClassOnly)
|
Method Summary | |
---|---|
java.util.List<HasWord> |
apply(java.util.List<HasWord> sentence)
Converts an input list of HasWord in IBM Arabic to
LDC ATBv3 representation. |
java.lang.String |
apply(java.lang.String w)
Applies escaping to a single word. |
void |
disableWarnings()
Disable warnings generated when tokens are escaped. |
static void |
main(java.lang.String[] args)
This main method preprocesses one-sentence-per-line input, making the same changes as the Function. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public IBMArabicEscaper()
public IBMArabicEscaper(boolean annoteAndClassOnly)
Method Detail |
---|
public void disableWarnings()
public java.util.List<HasWord> apply(java.util.List<HasWord> sentence)
HasWord
in IBM Arabic to
LDC ATBv3 representation. The method safely copies the input object
prior to escaping.
apply
in interface Function<java.util.List<HasWord>,java.util.List<HasWord>>
sentence
- A collection of type Word
java.lang.RuntimeException
- If a word is mapped to nullpublic java.lang.String apply(java.lang.String w)
w
- The word
java.lang.RuntimeException
- If a word is nullified (which is really bad for the parser and
for MT)public static void main(java.lang.String[] args) throws java.io.IOException
.sent
appended to their names. If you give the flag
-f
then output is instead sent to stdout. Input and output
is always in UTF-8.
args
- A list of filenames. The files must be UTF-8 encoded.
java.io.IOException
- If there are any issues
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |