IBMArabicEscaper (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.international.arabic.IBMArabicEscaper

All Implemented Interfaces:

Function<List<HasWord>,List<HasWord>>
```
public class IBMArabicEscaper
extends Object
implements Function<List<HasWord>,List<HasWord>>
```
This escaper is intended for use on flat input to be parsed by LexicalizedParser. It performs these functions functions:
- Deletes the clitic markers inserted by the IBM segmenter ('#' and '+')
- Deletes IBM classing for numbers
- Replaces tokens that must be escaped with the appropriate LDC escape sequences
- Applies the same orthographic normalization performed by ArabicTreeNormalizer
- intern()'s strings
This class supports both Buckwalter and UTF-8 encoding. IMPORTANT: This class must implement Function<List<HasWord>, List<HasWord>> in order to run with the parser.
Author:

Christopher Manning, Spence Green

Constructor Summary

Constructors
Constructor and Description

IBMArabicEscaper()

IBMArabicEscaper(boolean annoteAndClassOnly)

Constructors
Constructor and Description
`IBMArabicEscaper()`
`IBMArabicEscaper(boolean annoteAndClassOnly)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`List<HasWord>`	`apply(List<HasWord> sentence)` Converts an input list of `HasWord` in IBM Arabic to LDC ATBv3 representation.
`String`	`apply(String w)` Applies escaping to a single word.
`void`	`disableWarnings()` Disable warnings generated when tokens are escaped.
`static void`	`main(String[] args)` This main method preprocesses one-sentence-per-line input, making the same changes as the Function.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose, identity

- Constructor Detail
  - IBMArabicEscaper
```
public IBMArabicEscaper()
```
  - IBMArabicEscaper
```
public IBMArabicEscaper(boolean annoteAndClassOnly)
```
- Method Detail
  - disableWarnings
```
public void disableWarnings()
```
    Disable warnings generated when tokens are escaped.
  - apply
```
public List<HasWord> apply(List<HasWord> sentence)
```
    Converts an input list of HasWord in IBM Arabic to LDC ATBv3 representation. The method safely copies the input object prior to escaping.
    
    Specified by:
    
    apply in interface Function<List<HasWord>,List<HasWord>>
    
    Parameters:
    
    sentence - A collection of type Word
    
    Returns:
    
    A copy of the input with each word escaped.
    
    Throws:
    
    RuntimeException - If a word is mapped to null
  - apply
```
public String apply(String w)
```
    Applies escaping to a single word. Interns the escaped string.
    
    Parameters:
    
    w - The word
    
    Returns:
    
    The escaped word
    
    Throws:
    
    RuntimeException - If a word is nullified (which is really bad for the parser and for MT)
  - main
```
public static void main(String[] args)
                 throws IOException
```
    This main method preprocesses one-sentence-per-line input, making the same changes as the Function. By default it writes the output to files with the same name as the files passed in on the command line but with .sent appended to their names. If you give the flag -f then output is instead sent to stdout. Input and output is always in UTF-8.
    
    Parameters:
    
    args - A list of filenames. The files must be UTF-8 encoded.
    
    Throws:
    
    IOException - If there are any issues

Class IBMArabicEscaper

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Constructor Detail

IBMArabicEscaper

IBMArabicEscaper

Method Detail

disableWarnings

apply

apply

main