edu.stanford.nlp.trees.international.arabic
Class IBMArabicEscaper

java.lang.Object
  extended by edu.stanford.nlp.trees.international.arabic.IBMArabicEscaper
All Implemented Interfaces:
Function<List<HasWord>,List<HasWord>>

public class IBMArabicEscaper
extends Object
implements Function<List<HasWord>,List<HasWord>>

This escaper deletes the '#' and '+' symbols that the IBM segmenter uses to mark prefixes and suffixes, since they're not present in the Penn Arabic treebank materials (though later we might try adding them), and escapes the parenthesis characters.

Author:
Christopher Manning

Constructor Summary
IBMArabicEscaper()
           
 
Method Summary
 List<HasWord> apply(List<HasWord> arg)
          Note: At present this clobbers the input list items.
static void main(String[] args)
          This main method preprocesses one-sentence-per-line input, making the same changes as the Function.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

IBMArabicEscaper

public IBMArabicEscaper()
Method Detail

apply

public List<HasWord> apply(List<HasWord> arg)
Note: At present this clobbers the input list items. This should be fixed.

Specified by:
apply in interface Function<List<HasWord>,List<HasWord>>
Parameters:
arg - The function's argument
Returns:
The function's evaluated value

main

public static void main(String[] args)
                 throws IOException
This main method preprocesses one-sentence-per-line input, making the same changes as the Function.

Parameters:
args - A list of filenames. The files must be UTF-8 encoded.
Throws:
IOException - If there are any issues


Stanford NLP Group