edu.stanford.nlp.trees.international
Class PunctEquivalenceClasser
java.lang.Object
edu.stanford.nlp.trees.international.PunctEquivalenceClasser
public class PunctEquivalenceClasser
- extends Object
Performs equivalence classing of punctuation per PTB guidelines. Many of the multilingual
treebanks mark all punctuation with a single POS tag, which is bad for parsing.
PTB punctuation POS tag set (12 tags):
37. # Pound sign
38. $ Dollar sign
39. . Sentence-final punctuation
40. , Comma
41. : Colon, semi-colon
42. ( Left bracket character
43. ) Right bracket character
44. " Straight double quote
45. ` Left open single quote
46. " Left open double quote
47. ' Right close single quote
48. " Right close double quote
See http://www.ldc.upenn.edu/Catalog/docs/LDC95T7/cl93.html
- Author:
- Spence Green
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PunctEquivalenceClasser
public PunctEquivalenceClasser()
getPunctClass
public static String getPunctClass(String punc)
- Return the equivalence class of the argument. If the argument is not contained in
and equivalence class, then an empty string is returned.
- Parameters:
punc
-
- Returns:
- The class name if found. Otherwise, an empty string.
Stanford NLP Group