public class QuantifiableEntityNormalizer
extends java.lang.Object
QuantifiableEntityNormalizingAnnotator
.
Please keep the substantive content here, however, so as to lessen code
duplication.
Implementation note: The extensive test code for this class is
now in a separate JUnit Test class. This class depends on the background
symbol for NER being the default background symbol. This should be fixed
at some point.Modifier and Type | Field and Description |
---|---|
static java.lang.String |
BACKGROUND_SYMBOL |
static ClassicCounter<java.lang.String> |
ordinalsToValues |
static ClassicCounter<java.lang.String> |
wordsToValues |
Modifier and Type | Method and Description |
---|---|
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(java.util.List<E> l)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities
and tags each of their constituents with a "normalizedQuantity"
label which contains the appropriate normalized string corresponding to
the full quantity.
|
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(java.util.List<E> l,
boolean concatenate) |
static <E extends CoreMap> |
addNormalizedQuantitiesToEntities(java.util.List<E> list,
boolean concatenate,
boolean usesSUTime)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities
and tags each of their constituents with a "normalizedQuantity"
label which contains the appropriate normalized string corresponding to
the full quantity.
|
static <E extends CoreLabel> |
applySpecializedNER(java.util.List<E> l)
Runs a deterministic named entity classifier which is good at recognizing
numbers and money and date expressions not recognized by our statistical
NER.
|
static java.util.List<CoreLabel> |
collapseNERLabels(java.util.List<CoreLabel> l)
Currently this populates a
List<CoreLabel> with words from the passed List,
but NER entities are collapsed and CoreLabel constituents of entities have
NER information in their "quantity" fields. |
static <E extends CoreMap> |
isCompatible(java.lang.String tag,
E prev,
E cur) |
static java.util.List<java.util.List<CoreLabel>> |
normalizeClassifierOutput(java.util.List<java.util.List<CoreLabel>> l)
Takes the output of an
AbstractSequenceClassifier and marks up
each document by normalizing quantities. |
static java.lang.String |
normalizedNumberString(java.lang.String s,
java.lang.String nextWord,
java.lang.Number numberFromSUTime) |
static java.lang.String |
normalizedNumberStringQuiet(java.lang.String s,
double multiplier,
java.lang.String nextWord,
java.lang.Number numberFromSUTime) |
static java.lang.String |
normalizedOrdinalString(java.lang.String s,
java.lang.Number numberFromSUTime) |
static java.lang.String |
normalizedPercentString(java.lang.String s,
java.lang.Number numberFromSUTime) |
static java.lang.String |
normalizedTimeString(java.lang.String s,
Timex timexFromSUTime) |
public static java.lang.String BACKGROUND_SYMBOL
public static final ClassicCounter<java.lang.String> wordsToValues
public static final ClassicCounter<java.lang.String> ordinalsToValues
public static java.util.List<CoreLabel> collapseNERLabels(java.util.List<CoreLabel> l)
List<CoreLabel>
with words from the passed List,
but NER entities are collapsed and CoreLabel
constituents of entities have
NER information in their "quantity" fields.
NOTE: This now seems to be used nowhere. The collapsing is done elsewhere.
That's probably appropriate; it doesn't seem like this should be part of
QuantifiableEntityNormalizer, since it's set to collapse non-quantifiable
entities....l
- a list of CoreLabels with NER labels,public static java.lang.String normalizedTimeString(java.lang.String s, Timex timexFromSUTime)
public static java.lang.String normalizedNumberString(java.lang.String s, java.lang.String nextWord, java.lang.Number numberFromSUTime)
public static java.lang.String normalizedNumberStringQuiet(java.lang.String s, double multiplier, java.lang.String nextWord, java.lang.Number numberFromSUTime)
public static java.lang.String normalizedOrdinalString(java.lang.String s, java.lang.Number numberFromSUTime)
public static java.lang.String normalizedPercentString(java.lang.String s, java.lang.Number numberFromSUTime)
public static java.util.List<java.util.List<CoreLabel>> normalizeClassifierOutput(java.util.List<java.util.List<CoreLabel>> l)
AbstractSequenceClassifier
and marks up
each document by normalizing quantities. Each CoreLabel
in any
of the documents which is normalizable will receive a "normalizedQuantity"
attribute.l
- a List
of List
s of CoreLabel
spublic static <E extends CoreMap> void addNormalizedQuantitiesToEntities(java.util.List<E> l)
l
- A list of CoreMap
s representing a single
document. Note: the Labels are updated in place.public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(java.util.List<E> l, boolean concatenate)
public static <E extends CoreMap> boolean isCompatible(java.lang.String tag, E prev, E cur)
public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(java.util.List<E> list, boolean concatenate, boolean usesSUTime)
list
- A list of CoreMap
s representing a single
document. Note: the Labels are updated in place.concatenate
- true if quantities should be concatenated into one label, false otherwisepublic static <E extends CoreLabel> java.util.List<E> applySpecializedNER(java.util.List<E> l)
l
- A document to label