edu.stanford.nlp.international.arabic
Class Buckwalter
java.lang.Object
edu.stanford.nlp.international.arabic.Buckwalter
- All Implemented Interfaces:
- SerializableFunction<String,String>, Function<String,String>, Serializable
public class Buckwalter
- extends Object
- implements SerializableFunction<String,String>
This class can convert between Unicode and Buckwalter encodings of
Arabic.
Sources
"MORPHOLOGICAL ANALYSIS & POS ANNOTATION," v3.8. LDC. 08 June 2009.
http://www.ldc.upenn.edu/myl/morph/buckwalter.html
http://www.qamus.org/transliteration.htm (Tim Buckwalter's site)
http://www.livingflowers.com/Arabic_transliteration (many but hard to use)
http://www.cis.upenn.edu/~cis639/arabic/info/romanization.html
http://www.nongnu.org/aramorph/english/index.html (Java AraMorph)
BBN's MBuckWalter2Unicode.tab
see also my GALE-NOTES.txt file for other mappings ROSETTA people do.
Normalization of decomposed characters to composed:
ARABIC LETTER ALEF (ا), ARABIC MADDAH ABOVE (ٓ) ->
ARABIC LETTER ALEF WITH MADDA ABOVE
ARABIC LETTER ALEF (ا), ARABIC HAMZA ABOVE (ٔ) ->
ARABIC LETTER ALEF WITH HAMZA ABOVE (أ)
ARABIC LETTER WAW, ARABIC HAMZA ABOVE ->
ARABIC LETTER WAW WITH HAMZA ABOVE
ARABIC LETTER ALEF, ARABIC HAMZA BELOW (ٕ) ->
ARABIC LETTER ALEF WITH HAMZA BELOW
ARABIC LETTER YEH, ARABIC HAMZA ABOVE ->
ARABIC LETTER YEH WITH HAMZA ABOVE
- Author:
- Christopher Manning, Spence Green
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Buckwalter
public Buckwalter()
Buckwalter
public Buckwalter(boolean unicodeToBuckwalter)
suppressBuckDigitConversion
public void suppressBuckDigitConversion(boolean b)
suppressBuckPunctConversion
public void suppressBuckPunctConversion(boolean b)
apply
public String apply(String in)
- Description copied from interface:
Function
- Converts a T1 to a different T2. For example, a Parser
will convert a Sentence to a Tree. A Tagger will convert a Sentence
to a TaggedSentence.
- Specified by:
apply
in interface Function<String,String>
- Parameters:
in
- The function's argument
- Returns:
- The function's evaluated value
buckwalterToUnicode
public String buckwalterToUnicode(String in)
unicodeToBuckwalter
public String unicodeToBuckwalter(String in)
main
public static void main(String[] args)
- Parameters:
args
-
Stanford NLP Group