edu.stanford.nlp.pipeline
Class LabeledChunkIdentifier

java.lang.Object
  extended by edu.stanford.nlp.pipeline.LabeledChunkIdentifier

public class LabeledChunkIdentifier
extends Object

Identifies chunks based on labels that uses IOB like encoding Assumes labels have the form - where the tag is a prefix indicating where in the chunk it is. Supports various encodings: IO, IOB, IOE, BILOU, SBEIO, [] The type is Example: Bill works for Bank of America IO: I-PER O O I-ORG I-ORG I-ORG IOB1: B-PER O O B-ORG I-ORG I-ORG IOB2: I-PER O O B-ORG I-ORG I-ORG IOE1: E-PER O O I-ORG I-ORG E-ORG IOE2: I-PER O O I-ORG I-ORG E-ORG BILOU: U-PER O O B-ORG I-ORG L-ORG SBEIO: S-PER O O B-ORG I-ORG E-ORG

Author:
Angel Chang

Nested Class Summary
static class LabeledChunkIdentifier.LabelTagType
          Class representing a label, tag and type
 
Constructor Summary
LabeledChunkIdentifier()
           
 
Method Summary
 List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens, int totalTokensOffset, Class textKey, Class labelKey)
          Find and annotate chunks.
 List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens, int totalTokensOffset, Class textKey, Class labelKey, Class tokenChunkKey, Class tokenLabelKey)
          Find and annotate chunks.
 String getDefaultNegTag()
           
 String getDefaultPosTag()
           
 String getNegLabel()
           
 LabeledChunkIdentifier.LabelTagType getTagType(String label)
           
static boolean isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev, LabeledChunkIdentifier.LabelTagType cur)
          Returns whether a chunk ended between the previous and current token
static boolean isEndOfChunk(String prevTag, String prevType, String curTag, String curType)
          Returns whether a chunk ended between the previous and current token
 boolean isIgnoreProvidedTag()
           
static boolean isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev, LabeledChunkIdentifier.LabelTagType cur)
          Returns whether a chunk started between the previous and current token
static boolean isStartOfChunk(String prevTag, String prevType, String curTag, String curType)
          Returns whether a chunk started between the previous and current token
 void setDefaultNegTag(String defaultNegTag)
           
 void setDefaultPosTag(String defaultPosTag)
           
 void setIgnoreProvidedTag(boolean ignoreProvidedTag)
           
 void setNegLabel(String negLabel)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LabeledChunkIdentifier

public LabeledChunkIdentifier()
Method Detail

getAnnotatedChunks

public List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens,
                                        int totalTokensOffset,
                                        Class textKey,
                                        Class labelKey)
Find and annotate chunks. Returns list of CoreMap (Annotation) objects.

Parameters:
tokens - - List of tokens to look for chunks
totalTokensOffset - - Index of tokens to offset by
textKey - - Key to use to find the token text
labelKey - - Key to use to find the token label (to determine if inside chunk or not)
Returns:
List of annotations (each as a CoreMap) representing the chunks of tokens

getAnnotatedChunks

public List<CoreMap> getAnnotatedChunks(List<CoreLabel> tokens,
                                        int totalTokensOffset,
                                        Class textKey,
                                        Class labelKey,
                                        Class tokenChunkKey,
                                        Class tokenLabelKey)
Find and annotate chunks. Returns list of CoreMap (Annotation) objects each representing a chunk with the following annotations set: CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) TextAnnotation - String representing tokens in this chunks (token text separated by space)

Parameters:
tokens - - List of tokens to look for chunks
totalTokensOffset - - Index of tokens to offset by
labelKey - - Key to use to find the token label (to determine if inside chunk or not)
textKey - - Key to use to find the token text
tokenChunkKey - - If not null, each token is annotated with the chunk using this key
tokenLabelKey - - If not null, each token is annotated with the text associated with the chunk using this key
Returns:
List of annotations (each as a CoreMap) representing the chunks of tokens

isEndOfChunk

public static boolean isEndOfChunk(String prevTag,
                                   String prevType,
                                   String curTag,
                                   String curType)
Returns whether a chunk ended between the previous and current token

Parameters:
prevTag - - the tag of the previous token
prevType - - the type of the previous token
curTag - - the tag of the current token
curType - - the type of the current token
Returns:
true if the previous token was the last token of a chunk

isEndOfChunk

public static boolean isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev,
                                   LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk ended between the previous and current token

Parameters:
prev - - the label/tag/type of the previous token
cur - - the label/tag/type of the current token
Returns:
true if the previous token was the last token of a chunk

isStartOfChunk

public static boolean isStartOfChunk(String prevTag,
                                     String prevType,
                                     String curTag,
                                     String curType)
Returns whether a chunk started between the previous and current token

Parameters:
prevTag - - the tag of the previous token
prevType - - the type of the previous token
curTag - - the tag of the current token
curType - - the type of the current token
Returns:
true if the current token was the first token of a chunk

isStartOfChunk

public static boolean isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev,
                                     LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk started between the previous and current token

Parameters:
prev - - the label/tag/type of the previous token
cur - - the label/tag/type of the current token
Returns:
true if the current token was the first token of a chunk

getTagType

public LabeledChunkIdentifier.LabelTagType getTagType(String label)

getDefaultPosTag

public String getDefaultPosTag()

setDefaultPosTag

public void setDefaultPosTag(String defaultPosTag)

getDefaultNegTag

public String getDefaultNegTag()

setDefaultNegTag

public void setDefaultNegTag(String defaultNegTag)

getNegLabel

public String getNegLabel()

setNegLabel

public void setNegLabel(String negLabel)

isIgnoreProvidedTag

public boolean isIgnoreProvidedTag()

setIgnoreProvidedTag

public void setIgnoreProvidedTag(boolean ignoreProvidedTag)


Stanford NLP Group