edu.stanford.nlp.trees.international.pennchinese
Class ChineseEnglishWordMap

java.lang.Object
  extended by edu.stanford.nlp.trees.international.pennchinese.ChineseEnglishWordMap
All Implemented Interfaces:
java.io.Serializable

public class ChineseEnglishWordMap
extends java.lang.Object
implements java.io.Serializable

A class for mapping Chinese words to English. Uses CEDict free Lexicon.

Author:
Galen Andrew
See Also:
Serialized Form

Constructor Summary
ChineseEnglishWordMap()
          Make a ChineseEnglishWordMap with a default CEDict path.
ChineseEnglishWordMap(java.lang.String dictPath)
          Make a ChineseEnglishWordMap
ChineseEnglishWordMap(java.lang.String dictPath, boolean normalized)
          Make a ChineseEnglishWordMap
ChineseEnglishWordMap(java.lang.String dictPath, java.lang.String pattern, java.lang.String delimiter, java.lang.String charset)
           
ChineseEnglishWordMap(java.lang.String dictPath, java.lang.String pattern, java.lang.String delimiter, java.lang.String charset, boolean normalized)
           
 
Method Summary
 int addMap(java.util.Map<java.lang.String,java.util.Set<java.lang.String>> addM)
          Add all of the mappings from the specified map to the current map.
 boolean containsKey(java.lang.String key)
          Does the word exist in the dictionary?
 java.util.Set<java.lang.String> getAllTranslations(java.lang.String key)
           
 java.lang.String getFirstTranslation(java.lang.String key)
           
static ChineseEnglishWordMap getInstance()
          A method for getting a singleton instance of this class.
 java.util.Map<java.lang.String,java.util.Set<java.lang.String>> getReverseMap()
          Returns a reversed map of the current map.
static void main(java.lang.String[] args)
          The main method reads (segmented, whitespace delimited) words from a file and prints them with their English translation(s).
 void readCEDict(java.lang.String dictPath)
           
 void readCEDict(java.lang.String dictPath, java.lang.String pattern, java.lang.String delimiter, java.lang.String charset)
           
 int size()
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ChineseEnglishWordMap

public ChineseEnglishWordMap()
Make a ChineseEnglishWordMap with a default CEDict path. It looks for the file "cedict_ts.u8" in the working directory, for the value of the CEDICT environment variable, and in a Stanford NLP Group specific place. It throws an exception if a dictionary cannot be found.


ChineseEnglishWordMap

public ChineseEnglishWordMap(java.lang.String dictPath)
Make a ChineseEnglishWordMap

Parameters:
dictPath - the path/filename of the CEDict

ChineseEnglishWordMap

public ChineseEnglishWordMap(java.lang.String dictPath,
                             boolean normalized)
Make a ChineseEnglishWordMap

Parameters:
dictPath - the path/filename of the CEDict
normalized - whether the entries in dictionary are normalized or not

ChineseEnglishWordMap

public ChineseEnglishWordMap(java.lang.String dictPath,
                             java.lang.String pattern,
                             java.lang.String delimiter,
                             java.lang.String charset)

ChineseEnglishWordMap

public ChineseEnglishWordMap(java.lang.String dictPath,
                             java.lang.String pattern,
                             java.lang.String delimiter,
                             java.lang.String charset,
                             boolean normalized)
Method Detail

getInstance

public static ChineseEnglishWordMap getInstance()
A method for getting a singleton instance of this class. In general, you should use this method rather than the constructor, since each instance of the class is a large data file in memory.

Returns:
An instance of ChineseEnglishWordMap

containsKey

public boolean containsKey(java.lang.String key)
Does the word exist in the dictionary?

Parameters:
key - The word in Chinese
Returns:
Whether it is in the dictionary

getAllTranslations

public java.util.Set<java.lang.String> getAllTranslations(java.lang.String key)
Parameters:
key - a Chinese word
Returns:
the English translation (null if not in dictionary)

getFirstTranslation

public java.lang.String getFirstTranslation(java.lang.String key)
Parameters:
key - a Chinese word
Returns:
the English translations as an array (null if not in dictionary)

readCEDict

public void readCEDict(java.lang.String dictPath)

readCEDict

public void readCEDict(java.lang.String dictPath,
                       java.lang.String pattern,
                       java.lang.String delimiter,
                       java.lang.String charset)

getReverseMap

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> getReverseMap()
Returns a reversed map of the current map.

Returns:
A reversed map of the current map.

addMap

public int addMap(java.util.Map<java.lang.String,java.util.Set<java.lang.String>> addM)
Add all of the mappings from the specified map to the current map.


toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

size

public int size()

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
The main method reads (segmented, whitespace delimited) words from a file and prints them with their English translation(s). The path and filename of the CEDict Lexicon can be supplied via the "-dictPath" flag; otherwise the default filename "cedict_ts.u8" in the current directory is checked. By default, only the first translation is printed. If the "-all" flag is given, all translations are printed. The input and output encoding can be specified using the "-encoding" flag. Otherwise UTF-8 is assumed.

Throws:
java.io.IOException


Stanford NLP Group