edu.stanford.nlp.international.arabic.pipeline
Class DefaultLexicalMapper
java.lang.Object
edu.stanford.nlp.international.arabic.pipeline.DefaultLexicalMapper
- All Implemented Interfaces:
- Mapper, Serializable
public class DefaultLexicalMapper
- extends Object
- implements Mapper, Serializable
Applies a default set of lexical transformations that have been empirically validated
in various Arabic tasks. This class automatically detects the input encoding and applies
the appropriate set of transformations.
- Author:
- Spence Green
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
latinPunc
public final Pattern latinPunc
arabicPunc
public final Pattern arabicPunc
arabicDigit
public final Pattern arabicDigit
segmentationMarker
public final Pattern segmentationMarker
DefaultLexicalMapper
public DefaultLexicalMapper()
map
public String map(String parent,
String element)
- Description copied from interface:
Mapper
- Maps from one string representation to another.
- Specified by:
map
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- The transformed string
setup
public void setup(File path,
String... options)
- Description copied from interface:
Mapper
- Perform initialization prior to the first call to
map
.
- Specified by:
setup
in interface Mapper
- Parameters:
path
- A filename for data on disk used during mappingoptions
- Variable length array of strings for options. Option format may
vary for the particular class instance.
canChangeEncoding
public boolean canChangeEncoding(String parent,
String element)
- Description copied from interface:
Mapper
- Indicates whether
child
can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.
- Specified by:
canChangeEncoding
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- True if the string encoding can be changed. False otherwise.
main
public static void main(String[] args)
Stanford NLP Group