edu.stanford.nlp.trees.international.arabic
Class IBMArabicEscaper
java.lang.Object
edu.stanford.nlp.trees.international.arabic.IBMArabicEscaper
- All Implemented Interfaces:
- Function<List<HasWord>,List<HasWord>>
public class IBMArabicEscaper
- extends Object
- implements Function<List<HasWord>,List<HasWord>>
This escaper deletes the '#' and '+' symbols that the IBM segmenter uses
to mark prefixes and suffixes, since they're not present in the Penn
Arabic treebank materials (though later we might try adding them), and
escapes the parenthesis characters.
- Author:
- Christopher Manning
Method Summary |
List<HasWord> |
apply(List<HasWord> arg)
Note: At present this clobbers the input list items. |
static void |
main(String[] args)
This main method preprocesses one-sentence-per-line input, making the
same changes as the Function. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
IBMArabicEscaper
public IBMArabicEscaper()
apply
public List<HasWord> apply(List<HasWord> arg)
- Note: At present this clobbers the input list items.
This should be fixed.
- Specified by:
apply
in interface Function<List<HasWord>,List<HasWord>>
- Parameters:
arg
- The function's argument
- Returns:
- The function's evaluated value
main
public static void main(String[] args)
throws IOException
- This main method preprocesses one-sentence-per-line input, making the
same changes as the Function.
- Parameters:
args
- A list of filenames. The files must be UTF-8 encoded.
- Throws:
IOException
- If there are any issues
Stanford NLP Group