where the tag is a prefix indicating where in the chunk it is.
Supports various encodings: IO, IOB, IOE, BILOU, SBEIO, []
The type is
Example: Bill works for Bank of America
IO: I-PER O O I-ORG I-ORG I-ORG
IOB1: B-PER O O B-ORG I-ORG I-ORG
IOB2: I-PER O O B-ORG I-ORG I-ORG
IOE1: E-PER O O I-ORG I-ORG E-ORG
IOE2: I-PER O O I-ORG I-ORG E-ORG
BILOU: U-PER O O B-ORG I-ORG L-ORG
SBEIO: S-PER O O B-ORG I-ORG E-ORG
- Author:
- Angel Chang
Method Summary |
List<CoreMap> |
getAnnotatedChunks(List<CoreLabel> tokens,
int totalTokensOffset,
Class textKey,
Class labelKey)
Find and annotate chunks. |
List<CoreMap> |
getAnnotatedChunks(List<CoreLabel> tokens,
int totalTokensOffset,
Class textKey,
Class labelKey,
Class tokenChunkKey,
Class tokenLabelKey)
Find and annotate chunks. |
String |
getDefaultNegTag()
|
String |
getDefaultPosTag()
|
String |
getNegLabel()
|
LabeledChunkIdentifier.LabelTagType |
getTagType(String label)
|
static boolean |
isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk ended between the previous and current token |
static boolean |
isEndOfChunk(String prevTag,
String prevType,
String curTag,
String curType)
Returns whether a chunk ended between the previous and current token |
boolean |
isIgnoreProvidedTag()
|
static boolean |
isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk started between the previous and current token |
static boolean |
isStartOfChunk(String prevTag,
String prevType,
String curTag,
String curType)
Returns whether a chunk started between the previous and current token |
void |
setDefaultNegTag(String defaultNegTag)
|
void |
setDefaultPosTag(String defaultPosTag)
|
void |
setIgnoreProvidedTag(boolean ignoreProvidedTag)
|
void |
setNegLabel(String negLabel)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LabeledChunkIdentifier
public LabeledChunkIdentifier()
getAnnotatedChunks
public List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens,
int totalTokensOffset,
Class textKey,
Class labelKey)
- Find and annotate chunks. Returns list of CoreMap (Annotation) objects.
- Parameters:
tokens
- - List of tokens to look for chunkstotalTokensOffset
- - Index of tokens to offset bytextKey
- - Key to use to find the token textlabelKey
- - Key to use to find the token label (to determine if inside chunk or not)
- Returns:
- List of annotations (each as a CoreMap) representing the chunks of tokens
getAnnotatedChunks
public List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens,
int totalTokensOffset,
Class textKey,
Class labelKey,
Class tokenChunkKey,
Class tokenLabelKey)
- Find and annotate chunks. Returns list of CoreMap (Annotation) objects
each representing a chunk with the following annotations set:
CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk
CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk
TokensAnnotation - List of tokens in this chunk
TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens)
TokenEndAnnotation - Index of last token in chunk (index in original list of tokens)
TextAnnotation - String representing tokens in this chunks (token text separated by space)
- Parameters:
tokens
- - List of tokens to look for chunkstotalTokensOffset
- - Index of tokens to offset bylabelKey
- - Key to use to find the token label (to determine if inside chunk or not)textKey
- - Key to use to find the token texttokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this key
- Returns:
- List of annotations (each as a CoreMap) representing the chunks of tokens
isEndOfChunk
public static boolean isEndOfChunk(String prevTag,
String prevType,
String curTag,
String curType)
- Returns whether a chunk ended between the previous and current token
- Parameters:
prevTag
- - the tag of the previous tokenprevType
- - the type of the previous tokencurTag
- - the tag of the current tokencurType
- - the type of the current token
- Returns:
- true if the previous token was the last token of a chunk
isEndOfChunk
public static boolean isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
- Returns whether a chunk ended between the previous and current token
- Parameters:
prev
- - the label/tag/type of the previous tokencur
- - the label/tag/type of the current token
- Returns:
- true if the previous token was the last token of a chunk
isStartOfChunk
public static boolean isStartOfChunk(String prevTag,
String prevType,
String curTag,
String curType)
- Returns whether a chunk started between the previous and current token
- Parameters:
prevTag
- - the tag of the previous tokenprevType
- - the type of the previous tokencurTag
- - the tag of the current tokencurType
- - the type of the current token
- Returns:
- true if the current token was the first token of a chunk
isStartOfChunk
public static boolean isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
- Returns whether a chunk started between the previous and current token
- Parameters:
prev
- - the label/tag/type of the previous tokencur
- - the label/tag/type of the current token
- Returns:
- true if the current token was the first token of a chunk
getTagType
public LabeledChunkIdentifier.LabelTagType getTagType(String label)
getDefaultPosTag
public String getDefaultPosTag()
setDefaultPosTag
public void setDefaultPosTag(String defaultPosTag)
getDefaultNegTag
public String getDefaultNegTag()
setDefaultNegTag
public void setDefaultNegTag(String defaultNegTag)
getNegLabel
public String getNegLabel()
setNegLabel
public void setNegLabel(String negLabel)
isIgnoreProvidedTag
public boolean isIgnoreProvidedTag()
setIgnoreProvidedTag
public void setIgnoreProvidedTag(boolean ignoreProvidedTag)
Stanford NLP Group