edu.stanford.nlp.international.arabic.pipeline
Class DefaultLexicalMapper

java.lang.Object
  extended by edu.stanford.nlp.international.arabic.pipeline.DefaultLexicalMapper
All Implemented Interfaces:
Mapper, Serializable

public class DefaultLexicalMapper
extends Object
implements Mapper, Serializable

Applies a default set of lexical transformations that have been empirically validated in various Arabic tasks. This class automatically detects the input encoding and applies the appropriate set of transformations.

Author:
Spence Green
See Also:
Serialized Form

Field Summary
 Pattern arabicDigit
           
 Pattern arabicPunc
           
 Pattern latinPunc
           
 Pattern segmentationMarker
           
 
Constructor Summary
DefaultLexicalMapper()
           
 
Method Summary
 boolean canChangeEncoding(String parent, String element)
          Indicates whether child can be converted to another encoding.
static void main(String[] args)
           
 String map(String parent, String element)
          Maps from one string representation to another.
 void setup(File path, String... options)
          Perform initialization prior to the first call to map.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

latinPunc

public final Pattern latinPunc

arabicPunc

public final Pattern arabicPunc

arabicDigit

public final Pattern arabicDigit

segmentationMarker

public final Pattern segmentationMarker
Constructor Detail

DefaultLexicalMapper

public DefaultLexicalMapper()
Method Detail

map

public String map(String parent,
                  String element)
Description copied from interface: Mapper
Maps from one string representation to another.

Specified by:
map in interface Mapper
Parameters:
parent - element's context (e.g., the parent node in a parse tree)
element - The string to be transformed.
Returns:
The transformed string

setup

public void setup(File path,
                  String... options)
Description copied from interface: Mapper
Perform initialization prior to the first call to map.

Specified by:
setup in interface Mapper
Parameters:
path - A filename for data on disk used during mapping
options - Variable length array of strings for options. Option format may vary for the particular class instance.

canChangeEncoding

public boolean canChangeEncoding(String parent,
                                 String element)
Description copied from interface: Mapper
Indicates whether child can be converted to another encoding. In the ATB, for example, if a punctuation character is labeled with the "PUNC" POS tag, then that character should not be converted from Buckwalter to UTF-8.

Specified by:
canChangeEncoding in interface Mapper
Parameters:
parent - element's context (e.g., the parent node in a parse tree)
element - The string to be transformed.
Returns:
True if the string encoding can be changed. False otherwise.

main

public static void main(String[] args)


Stanford NLP Group