public class QuantifiableEntityNormalizer extends Object
QuantifiableEntityNormalizingAnnotator
.
Please keep the substantive content here, however, so as to lessen code
duplication.
Implementation note: The extensive test code for this class is now in a separate JUnit Test class. This class depends on the background symbol for NER being the default background symbol. This should be fixed at some point.
Modifier and Type | Field and Description |
---|---|
static String |
BACKGROUND_SYMBOL |
static Pattern |
numberPattern |
static ClassicCounter<String> |
ordinalsToValues |
static ClassicCounter<String> |
wordsToValues |
Modifier and Type | Method and Description |
---|---|
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(List<E> l)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities
and tags each of their consitituents with a "normalizedQuantity"
label which contains the appropriate normalized string corresponding to
the full quantity.
|
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(List<E> l,
boolean concatenate) |
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(List<E> list,
boolean concatenate,
boolean usesSUTime)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities
and tags each of their constituents with a "normalizedQuantity"
label which contains the appropriate normalized string corresponding to
the full quantity.
|
static <E extends CoreLabel> |
applySpecializedNER(List<E> l)
Runs a deterministic named entity classifier which is good at recognizing
numbers and money and date expressions not recognized by our statistical
NER.
|
static List<CoreLabel> |
collapseNERLabels(List<CoreLabel> l)
Currently this populates a List<CoreLabel> with words from the passed List,
but NER entities are collapsed and
CoreLabel constituents of entities have
NER information in their "quantity" fields. |
static <E extends CoreMap> |
fixupNerBeforeNormalization(List<E> list) |
static <E extends CoreMap> |
isCompatible(String tag,
E prev,
E cur) |
static List<List<CoreLabel>> |
normalizeClassifierOutput(List<List<CoreLabel>> l)
Takes the output of an
AbstractSequenceClassifier and marks up
each document by normalizing quantities. |
static String |
normalizedNumberString(String s,
String nextWord,
Number numberFromSUTime) |
static String |
normalizedNumberStringQuiet(String s,
double multiplier,
String nextWord,
Number numberFromSUTime) |
static String |
normalizedOrdinalString(String s,
Number numberFromSUTime) |
static String |
normalizedOrdinalStringQuiet(String s,
Number numberFromSUTime) |
static String |
normalizedPercentString(String s,
Number numberFromSUTime) |
static String |
normalizedTimeString(String s,
String ampm,
Timex timexFromSUTime) |
static String |
normalizedTimeString(String s,
Timex timexFromSUTime) |
static <E extends CoreMap> |
singleEntityToString(List<E> l)
Convert the content of a List of CoreMaps to a single
space-separated String.
|
public static String BACKGROUND_SYMBOL
public static final ClassicCounter<String> wordsToValues
public static final ClassicCounter<String> ordinalsToValues
public static final Pattern numberPattern
public static <E extends CoreMap> String singleEntityToString(List<E> l)
l
- The Listpublic static List<CoreLabel> collapseNERLabels(List<CoreLabel> l)
CoreLabel
constituents of entities have
NER information in their "quantity" fields.
NOTE: This now seems to be used nowhere. The collapsing is done elsewhere. That's probably appropriate; it doesn't seem like this should be part of QuantifiableEntityNormalizer, since it's set to collapse non-quantifiable entities....
l
- a list of CoreLabels with NER labels,public static String normalizedTimeString(String s, String ampm, Timex timexFromSUTime)
public static String normalizedNumberString(String s, String nextWord, Number numberFromSUTime)
public static String normalizedNumberStringQuiet(String s, double multiplier, String nextWord, Number numberFromSUTime)
public static String normalizedOrdinalString(String s, Number numberFromSUTime)
public static String normalizedOrdinalStringQuiet(String s, Number numberFromSUTime)
public static String normalizedPercentString(String s, Number numberFromSUTime)
public static List<List<CoreLabel>> normalizeClassifierOutput(List<List<CoreLabel>> l)
AbstractSequenceClassifier
and marks up
each document by normalizing quantities. Each CoreLabel
in any
of the documents which is normalizable will receive a "normalizedQuantity"
attribute.public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(List<E> l)
l
- A list of CoreMap
s representing a single
document. Note: the Labels are updated in place.public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(List<E> l, boolean concatenate)
public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(List<E> list, boolean concatenate, boolean usesSUTime)
list
- A list of CoreMap
s representing a single
document. Note: the Labels are updated in place.concatenate
- true if quantities should be concatenated into one label, false otherwisepublic static <E extends CoreMap> void fixupNerBeforeNormalization(List<E> list)
public static <E extends CoreLabel> List<E> applySpecializedNER(List<E> l)
l
- A document to label