edu.stanford.nlp.international.arabic
Class IBMArabicEscaper

java.lang.Object
  extended by edu.stanford.nlp.international.arabic.IBMArabicEscaper
All Implemented Interfaces:
Function<java.util.List<HasWord>,java.util.List<HasWord>>

public class IBMArabicEscaper
extends java.lang.Object
implements Function<java.util.List<HasWord>,java.util.List<HasWord>>

This escaper is intended for use on flat input to be parsed by LexicalizedParser. It performs these functions functions:

This class supports both Buckwalter and UTF-8 encoding.

IMPORTANT: This class must implement Function, List> in order to run with the parser.

Author:
Christopher Manning, Spence Green

Constructor Summary
IBMArabicEscaper()
           
IBMArabicEscaper(boolean annoteAndClassOnly)
           
 
Method Summary
 java.util.List<HasWord> apply(java.util.List<HasWord> sentence)
          Converts an input list of HasWord in IBM Arabic to LDC ATBv3 representation.
 java.lang.String apply(java.lang.String w)
          Applies escaping to a single word.
 void disableWarnings()
          Disable warnings generated when tokens are escaped.
static void main(java.lang.String[] args)
          This main method preprocesses one-sentence-per-line input, making the same changes as the Function.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

IBMArabicEscaper

public IBMArabicEscaper()

IBMArabicEscaper

public IBMArabicEscaper(boolean annoteAndClassOnly)
Method Detail

disableWarnings

public void disableWarnings()
Disable warnings generated when tokens are escaped.


apply

public java.util.List<HasWord> apply(java.util.List<HasWord> sentence)
Converts an input list of HasWord in IBM Arabic to LDC ATBv3 representation. The method safely copies the input object prior to escaping.

Specified by:
apply in interface Function<java.util.List<HasWord>,java.util.List<HasWord>>
Parameters:
sentence - A collection of type Word
Returns:
A copy of the input with each word escaped.
Throws:
java.lang.RuntimeException - If a word is mapped to null

apply

public java.lang.String apply(java.lang.String w)
Applies escaping to a single word. Interns the escaped string.

Parameters:
w - The word
Returns:
The escaped word
Throws:
java.lang.RuntimeException - If a word is nullified (which is really bad for the parser and for MT)

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
This main method preprocesses one-sentence-per-line input, making the same changes as the Function. By default it writes the output to files with the same name as the files passed in on the command line but with .sent appended to their names. If you give the flag -f then output is instead sent to stdout. Input and output is always in UTF-8.

Parameters:
args - A list of filenames. The files must be UTF-8 encoded.
Throws:
java.io.IOException - If there are any issues


Stanford NLP Group