- All Implemented Interfaces:
- java.util.function.Function<java.util.List<HasWord>,java.util.List<HasWord>>
public class ChineseEscaper
extends java.lang.Object
implements java.util.function.Function<java.util.List<HasWord>,java.util.List<HasWord>>
An Escaper for Chinese normalization to match Treebank.
Currently normalizes "ASCII" characters into the full-width
range used inside the Penn Chinese Treebank.
Notes: Smart quotes appear in CTB, and are left unchanged.
I think you get various hyphen types from U+2000 range too - certainly,
Roger lists them in LanguagePack.
- Author:
- Christopher Manning