edu.stanford.nlp.international.arabic.pipeline
Class DefaultLexicalMapper
java.lang.Object
edu.stanford.nlp.international.arabic.pipeline.DefaultLexicalMapper
- All Implemented Interfaces:
- Mapper, Serializable
public class DefaultLexicalMapper
- extends Object
- implements Mapper, Serializable
Applies a default set of lexical transformations that have been empirically validated
in various Arabic tasks. This class automatically detects the input encoding and applies
the appropriate set of transformations.
- Author:
- Spence Green
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
latinPunc
public static final Pattern latinPunc
arabicPunc
public static final Pattern arabicPunc
arabicDigit
public static final Pattern arabicDigit
segmentationMarker
public static final Pattern segmentationMarker
DefaultLexicalMapper
public DefaultLexicalMapper()
map
public String map(String parent,
String element)
- Description copied from interface:
Mapper
- Maps from one string representation to another.
- Specified by:
map
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- The transformed string
setup
public void setup(File path,
String... options)
- Description copied from interface:
Mapper
- Perform initialization prior to the first call to
map
.
- Specified by:
setup
in interface Mapper
- Parameters:
path
- A filename for data on disk used during mappingoptions
- Variable length array for setting options
canChangeEncoding
public boolean canChangeEncoding(String parent,
String element)
- Description copied from interface:
Mapper
- Indicates whether
child
can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.
- Specified by:
canChangeEncoding
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- True if the string encoding can be changed. False otherwise.
main
public static void main(String[] args)
Stanford NLP Group