edu.stanford.nlp.international.arabic.pipeline
Class DefaultLexicalMapper
java.lang.Object
edu.stanford.nlp.international.arabic.pipeline.DefaultLexicalMapper
- All Implemented Interfaces:
- Mapper, java.io.Serializable
public class DefaultLexicalMapper
- extends java.lang.Object
- implements Mapper, java.io.Serializable
Applies a default set of lexical transformations that have been empirically validated
in various Arabic tasks. This class automatically detects the input encoding and applies
the appropriate set of transformations.
- Author:
- Spence Green
- See Also:
- Serialized Form
Method Summary |
boolean |
canChangeEncoding(java.lang.String parent,
java.lang.String element)
Indicates whether child can be converted to another encoding. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
map(java.lang.String parent,
java.lang.String element)
Maps from one string representation to another. |
void |
setup(java.io.File path,
java.lang.String... options)
Perform initialization prior to the first call to map . |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
latinPunc
public static final java.util.regex.Pattern latinPunc
arabicPunc
public static final java.util.regex.Pattern arabicPunc
arabicDigit
public static final java.util.regex.Pattern arabicDigit
segmentationMarker
public static final java.util.regex.Pattern segmentationMarker
DefaultLexicalMapper
public DefaultLexicalMapper()
map
public java.lang.String map(java.lang.String parent,
java.lang.String element)
- Description copied from interface:
Mapper
- Maps from one string representation to another.
- Specified by:
map
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- The transformed string
setup
public void setup(java.io.File path,
java.lang.String... options)
- Description copied from interface:
Mapper
- Perform initialization prior to the first call to
map
.
- Specified by:
setup
in interface Mapper
- Parameters:
path
- A filename for data on disk used during mappingoptions
- Variable length array of strings for options. Option format may
vary for the particular class instance.
canChangeEncoding
public boolean canChangeEncoding(java.lang.String parent,
java.lang.String element)
- Description copied from interface:
Mapper
- Indicates whether
child
can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.
- Specified by:
canChangeEncoding
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- True if the string encoding can be changed. False otherwise.
main
public static void main(java.lang.String[] args)
Stanford NLP Group