edu.stanford.nlp.international.arabic
Class Buckwalter

java.lang.Object
  extended by edu.stanford.nlp.international.arabic.Buckwalter
All Implemented Interfaces:
SerializableFunction<java.lang.String,java.lang.String>, Function<java.lang.String,java.lang.String>, java.io.Serializable

public class Buckwalter
extends java.lang.Object
implements SerializableFunction<java.lang.String,java.lang.String>

This class can convert between Unicode and Buckwalter encodings of Arabic.

Sources

"MORPHOLOGICAL ANALYSIS & POS ANNOTATION," v3.8. LDC. 08 June 2009. http://www.ldc.upenn.edu/myl/morph/buckwalter.html http://www.qamus.org/transliteration.htm (Tim Buckwalter's site) http://www.livingflowers.com/Arabic_transliteration (many but hard to use) http://www.cis.upenn.edu/~cis639/arabic/info/romanization.html http://www.nongnu.org/aramorph/english/index.html (Java AraMorph) BBN's MBuckWalter2Unicode.tab see also my GALE-NOTES.txt file for other mappings ROSETTA people do. Normalization of decomposed characters to composed: ARABIC LETTER ALEF (ا), ARABIC MADDAH ABOVE (ٓ) -> ARABIC LETTER ALEF WITH MADDA ABOVE ARABIC LETTER ALEF (ا), ARABIC HAMZA ABOVE (ٔ) -> ARABIC LETTER ALEF WITH HAMZA ABOVE (أ) ARABIC LETTER WAW, ARABIC HAMZA ABOVE -> ARABIC LETTER WAW WITH HAMZA ABOVE ARABIC LETTER ALEF, ARABIC HAMZA BELOW (ٕ) -> ARABIC LETTER ALEF WITH HAMZA BELOW ARABIC LETTER YEH, ARABIC HAMZA ABOVE -> ARABIC LETTER YEH WITH HAMZA ABOVE

Author:
Christopher Manning, Spence Green
See Also:
Serialized Form

Constructor Summary
Buckwalter()
           
Buckwalter(boolean unicodeToBuckwalter)
           
 
Method Summary
 java.lang.String apply(java.lang.String in)
          Converts a T1 to a different T2.
 java.lang.String buckwalterToUnicode(java.lang.String in)
           
static void main(java.lang.String[] args)
           
 void suppressBuckDigitConversion(boolean b)
           
 void suppressBuckPunctConversion(boolean b)
           
 java.lang.String unicodeToBuckwalter(java.lang.String in)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Buckwalter

public Buckwalter()

Buckwalter

public Buckwalter(boolean unicodeToBuckwalter)
Method Detail

suppressBuckDigitConversion

public void suppressBuckDigitConversion(boolean b)

suppressBuckPunctConversion

public void suppressBuckPunctConversion(boolean b)

apply

public java.lang.String apply(java.lang.String in)
Description copied from interface: Function
Converts a T1 to a different T2. For example, a Parser will convert a Sentence to a Tree. A Tagger will convert a Sentence to a TaggedSentence.

Specified by:
apply in interface Function<java.lang.String,java.lang.String>
Parameters:
in - The function's argument
Returns:
The function's evaluated value

buckwalterToUnicode

public java.lang.String buckwalterToUnicode(java.lang.String in)

unicodeToBuckwalter

public java.lang.String unicodeToBuckwalter(java.lang.String in)

main

public static void main(java.lang.String[] args)
Parameters:
args -


Stanford NLP Group