|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.pipeline.ChunkAnnotationUtils
public class ChunkAnnotationUtils
Utility functions for annotating chunks
Constructor Summary | |
---|---|
ChunkAnnotationUtils()
|
Method Summary | |
---|---|
static void |
annotateChunk(CoreMap annotation,
Class newAnnotationKey,
Class aggrKey,
CoreMapAttributeAggregator aggregator)
|
static void |
annotateChunk(CoreMap chunk,
List<CoreLabel> tokens,
int tokenStartIndex,
int tokenEndIndex,
int totalTokenOffset)
Annotates a CoreMap representing a chunk with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + totalTokenOffset TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + totalTokenOffset |
static void |
annotateChunk(CoreMap chunk,
Map<String,String> attributes)
|
static void |
annotateChunks(List<? extends CoreMap> chunks,
int start,
int end,
Map<String,String> attributes)
|
static void |
annotateChunks(List<? extends CoreMap> chunks,
Map<String,String> attributes)
|
static void |
annotateChunkText(CoreMap chunk,
Class tokenTextKey)
Annotates a CoreMap representing a chunk with text information TextAnnotation - String representing tokens in this chunks (token text separated by space) |
static void |
annotateChunkText(CoreMap chunk,
CoreMap origAnnotation)
Annotates a CoreMap representing a chunk with text information TextAnnotation - String extracted from the origAnnotation using character offset information for this chunk |
static void |
annotateChunkTokens(CoreMap chunk,
Class tokenChunkKey,
Class tokenLabelKey)
Annotates tokens in chunk |
static boolean |
checkOffsets(CoreMap docAnnotation)
Checks if offsets of doc and sentence matches |
static void |
copyUnsetAnnotations(CoreMap src,
CoreMap dest)
Copies annotation over to this coremap if not already set |
static boolean |
fixChunkSentenceBoundaries(CoreMap docAnnotation,
List<IntPair> chunkCharOffsets)
Give an list of character offsets for chunk, fix sentence splitting so sentences doesn't break the chunks |
static boolean |
fixChunkSentenceBoundaries(CoreMap docAnnotation,
List<IntPair> chunkCharOffsets,
boolean offsetsAreNotSorted,
boolean extendedFixSentence,
boolean moreExtendedFixSentence)
Give an list of character offsets for chunk, fix sentence splitting so sentences doesn't break the chunks |
static boolean |
fixChunkTokenBoundaries(CoreMap docAnnotation,
List<IntPair> chunkCharOffsets)
Give an list of character offsets for chunk, fix tokenization so tokenization occurs at boundary of chunks |
static boolean |
fixTokenOffsets(CoreMap docAnnotation)
Fix token offsets of sentences to match those in the document (assumes tokens are shared) sentence token indices may not match document token list if certain html elements are ignored |
static Annotation |
getAnnotatedChunk(CoreMap annotation,
int tokenStartIndex,
int tokenEndIndex)
Create a new chunk Annotation with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + annotation's TokenBeginAnnotation TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + annotation's TokenBeginAnnotation TextAnnotation - String extracted from the origAnnotation using character offset information for this chunk |
static Annotation |
getAnnotatedChunk(CoreMap annotation,
int tokenStartIndex,
int tokenEndIndex,
Class tokenChunkKey,
Class tokenLabelKey)
Create a new chunk Annotation with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + annotation's TokenBeginAnnotation TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + annotation's TokenBeginAnnotation TextAnnotation - String extracted from the origAnnotation using character offset information for this chunk |
static Annotation |
getAnnotatedChunk(List<CoreLabel> tokens,
int tokenStartIndex,
int tokenEndIndex,
int totalTokenOffset)
Create a new chunk Annotation with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + totalTokenOffset TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + totalTokenOffset |
static Annotation |
getAnnotatedChunk(List<CoreLabel> tokens,
int tokenStartIndex,
int tokenEndIndex,
int totalTokenOffset,
Class tokenChunkKey,
Class tokenTextKey,
Class tokenLabelKey)
Create a new chunk Annotation with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + totalTokenOffset TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + totalTokenOffset TextAnnotation - String extracted from the origAnnotation using character offset information for this chunk |
static List<CoreMap> |
getAnnotatedChunksUsingSortedCharOffsets(CoreMap annotation,
List<IntPair> charOffsets)
|
static List<CoreMap> |
getAnnotatedChunksUsingSortedCharOffsets(CoreMap annotation,
List<IntPair> charOffsets,
boolean charOffsetIsRelative,
Class tokenChunkKey,
Class tokenLabelKey,
boolean allowPartialTokens)
Create a list of new chunk Annotation with basic chunk information CharacterOffsetBeginAnnotation - set to CharacterOffsetBeginAnnotation of first token in chunk CharacterOffsetEndAnnotation - set to CharacterOffsetEndAnnotation of last token in chunk TokensAnnotation - List of tokens in this chunk TokenBeginAnnotation - Index of first token in chunk (index in original list of tokens) tokenStartIndex + annotation's TokenBeginAnnotation TokenEndAnnotation - Index of last token in chunk (index in original list of tokens) tokenEndIndex + annotation's TokenBeginAnnotation TextAnnotation - String extracted from the origAnnotation using character offset information for this chunk |
static CoreMap |
getAnnotatedChunkUsingCharOffsets(CoreMap annotation,
int charOffsetStart,
int charOffsetEnd)
|
static Interval<Integer> |
getChunkOffsetsUsingCharOffsets(List<? extends CoreMap> chunkList,
int charStart,
int charEnd)
Return chunk offsets |
static Character |
getFirstNonWsChar(CoreMap sent)
|
static Integer |
getFirstNonWsCharOffset(CoreMap sent,
boolean relative)
|
static CoreMap |
getMergedChunk(List<? extends CoreMap> chunkList,
int chunkIndexStart,
int chunkIndexEnd,
Map<Class,CoreMapAttributeAggregator> aggregators)
Create chunk that is merged from chunkIndexStart to chunkIndexEnd (exclusive) |
static CoreMap |
getMergedChunk(List<? extends CoreMap> chunkList,
String origText,
int chunkIndexStart,
int chunkIndexEnd)
Create chunk that is merged from chunkIndexStart to chunkIndexEnd (exclusive) |
static String |
getTokenText(List<? extends CoreMap> tokens,
Class tokenTextKey)
|
static String |
getTokenText(List<? extends CoreMap> tokens,
Class tokenTextKey,
String delimiter)
|
static String |
getTrimmedText(CoreMap sent)
|
static void |
mergeChunks(List<CoreMap> chunkList,
String origText,
int chunkIndexStart,
int chunkIndexEnd)
Merge chunks from chunkIndexStart to chunkIndexEnd (exclusive) and replace them in the list |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ChunkAnnotationUtils()
Method Detail |
---|
public static boolean checkOffsets(CoreMap docAnnotation)
docAnnotation
-
public static boolean fixTokenOffsets(CoreMap docAnnotation)
docAnnotation
-
public static void copyUnsetAnnotations(CoreMap src, CoreMap dest)
public static boolean fixChunkTokenBoundaries(CoreMap docAnnotation, List<IntPair> chunkCharOffsets)
docAnnotation
- chunkCharOffsets
- public static CoreMap getMergedChunk(List<? extends CoreMap> chunkList, String origText, int chunkIndexStart, int chunkIndexEnd)
chunkList
- - List of chunksorigText
- - Text from which to extract chunk textchunkIndexStart
- - Index of first chunk to mergechunkIndexEnd
- - Index of last chunk to merge (exclusive)
public static CoreMap getMergedChunk(List<? extends CoreMap> chunkList, int chunkIndexStart, int chunkIndexEnd, Map<Class,CoreMapAttributeAggregator> aggregators)
chunkList
- - List of chunkschunkIndexStart
- - Index of first chunk to mergechunkIndexEnd
- - Index of last chunk to merge (exclusive)aggregators
- - Aggregators
public static Interval<Integer> getChunkOffsetsUsingCharOffsets(List<? extends CoreMap> chunkList, int charStart, int charEnd)
chunkList
- - List of chunkscharStart
- - character begin offsetcharEnd
- - character end offset
public static void mergeChunks(List<CoreMap> chunkList, String origText, int chunkIndexStart, int chunkIndexEnd)
chunkList
- - List of chunksorigText
- - Text from which to extract chunk textchunkIndexStart
- - Index of first chunk to mergechunkIndexEnd
- - Index of last chunk to merge (exclusive)public static Character getFirstNonWsChar(CoreMap sent)
public static Integer getFirstNonWsCharOffset(CoreMap sent, boolean relative)
public static String getTrimmedText(CoreMap sent)
public static boolean fixChunkSentenceBoundaries(CoreMap docAnnotation, List<IntPair> chunkCharOffsets)
docAnnotation
- Document with sentenceschunkCharOffsets
- ordered pairs of different chunks that should appear in sentences
public static boolean fixChunkSentenceBoundaries(CoreMap docAnnotation, List<IntPair> chunkCharOffsets, boolean offsetsAreNotSorted, boolean extendedFixSentence, boolean moreExtendedFixSentence)
docAnnotation
- Document with sentenceschunkCharOffsets
- ordered pairs of different chunks that should appear in sentencesoffsetsAreNotSorted
- Treat each pair of offsets as independent (look through all sentences again)extendedFixSentence
- Do extended sentence fixing based on some heuristicsmoreExtendedFixSentence
- Do even more extended sentence fixing based on some heuristics
public static void annotateChunk(CoreMap chunk, List<CoreLabel> tokens, int tokenStartIndex, int tokenEndIndex, int totalTokenOffset)
chunk
- - CoreMap to be annotatedtokens
- - List of tokens to look for chunkstokenStartIndex
- - Index (relative to current list of tokens) at which this chunk startstokenEndIndex
- - Index (relative to current list of tokens) at which this chunk ends (not inclusive)totalTokenOffset
- - Index of tokens to offset bypublic static String getTokenText(List<? extends CoreMap> tokens, Class tokenTextKey)
public static String getTokenText(List<? extends CoreMap> tokens, Class tokenTextKey, String delimiter)
public static void annotateChunkText(CoreMap chunk, Class tokenTextKey)
chunk
- - CoreMap to be annotatedtokenTextKey
- - Key to use to find the token textpublic static void annotateChunkText(CoreMap chunk, CoreMap origAnnotation)
chunk
- - CoreMap to be annotatedorigAnnotation
- - Annotation from which to extract the text for this chunkpublic static void annotateChunkTokens(CoreMap chunk, Class tokenChunkKey, Class tokenLabelKey)
chunk
- - CoreMap representing chunk (should have TextAnnotation and TokensAnnotation)tokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this keypublic static Annotation getAnnotatedChunk(List<CoreLabel> tokens, int tokenStartIndex, int tokenEndIndex, int totalTokenOffset)
tokens
- - List of tokens to look for chunkstokenStartIndex
- - Index (relative to current list of tokens) at which this chunk startstokenEndIndex
- - Index (relative to current list of tokens) at which this chunk ends (not inclusive)totalTokenOffset
- - Index of tokens to offset by
public static Annotation getAnnotatedChunk(List<CoreLabel> tokens, int tokenStartIndex, int tokenEndIndex, int totalTokenOffset, Class tokenChunkKey, Class tokenTextKey, Class tokenLabelKey)
tokens
- - List of tokens to look for chunkstokenStartIndex
- - Index (relative to current list of tokens) at which this chunk startstokenEndIndex
- - Index (relative to current list of tokens) at which this chunk ends (not inclusive)totalTokenOffset
- - Index of tokens to offset bytokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenTextKey
- - Key to use to find the token texttokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this key
public static Annotation getAnnotatedChunk(CoreMap annotation, int tokenStartIndex, int tokenEndIndex)
annotation
- - Annotation from which to extract the text for this chunktokenStartIndex
- - Index (relative to current list of tokens) at which this chunk startstokenEndIndex
- - Index (relative to current list of tokens) at which this chunk ends (not inclusive)
public static Annotation getAnnotatedChunk(CoreMap annotation, int tokenStartIndex, int tokenEndIndex, Class tokenChunkKey, Class tokenLabelKey)
annotation
- - Annotation from which to extract the text for this chunktokenStartIndex
- - Index (relative to current list of tokens) at which this chunk startstokenEndIndex
- - Index (relative to current list of tokens) at which this chunk ends (not inclusive)tokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this key
public static CoreMap getAnnotatedChunkUsingCharOffsets(CoreMap annotation, int charOffsetStart, int charOffsetEnd)
public static List<CoreMap> getAnnotatedChunksUsingSortedCharOffsets(CoreMap annotation, List<IntPair> charOffsets)
public static List<CoreMap> getAnnotatedChunksUsingSortedCharOffsets(CoreMap annotation, List<IntPair> charOffsets, boolean charOffsetIsRelative, Class tokenChunkKey, Class tokenLabelKey, boolean allowPartialTokens)
annotation
- - Annotation from which to extract the text for this chunkcharOffsets
- - List of start and end (not inclusive) character offsets
Note: assume char offsets are sorted and nonoverlapping!!!charOffsetIsRelative
- - Whether the character offsets are relative to the current annotation or absolute offsetstokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this keyallowPartialTokens
- - Whether to allow partial tokens or not
public static void annotateChunk(CoreMap annotation, Class newAnnotationKey, Class aggrKey, CoreMapAttributeAggregator aggregator)
public static void annotateChunk(CoreMap chunk, Map<String,String> attributes)
public static void annotateChunks(List<? extends CoreMap> chunks, int start, int end, Map<String,String> attributes)
public static void annotateChunks(List<? extends CoreMap> chunks, Map<String,String> attributes)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |