public class RelationTripleSegmenter
extends java.lang.Object
SentenceFragment
and converts it to a conventional
OpenIE triple, as materialized in the RelationTriple
class.Modifier and Type | Field and Description |
---|---|
java.util.List<TokenSequencePattern> |
NOUN_TOKEN_PATTERNS
A set of nominal patterns, that don't require being in a coherent clause, but do require NER information.
|
java.util.Set<java.lang.String> |
VALID_ADVERB_ARCS
A set of valid arcs denoting an adverbial modifier we are interested in
|
java.util.Set<java.lang.String> |
VALID_OBJECT_ARCS
A set of valid arcs denoting an object entity we are interested in
|
java.util.Set<java.lang.String> |
VALID_SUBJECT_ARCS
A set of valid arcs denoting a subject entity we are interested in
|
java.util.List<SemgrexPattern> |
VERB_PATTERNS
A list of patterns to match relation extractions against
|
java.util.List<SemgrexPattern> |
VP_PATTERNS
A set of derivative patterns from
VERB_PATTERNS that ignore the subject
arc. |
Constructor and Description |
---|
RelationTripleSegmenter() |
RelationTripleSegmenter(boolean allowNominalsWithoutNER)
Create a new relation triple segmenter.
|
Modifier and Type | Method and Description |
---|---|
java.util.List<RelationTriple> |
extract(SemanticGraph parse,
java.util.List<CoreLabel> tokens)
Extract the nominal patterns from this sentence.
|
protected java.util.Optional<java.util.List<IndexedWord>> |
getValidAdverbChunk(SemanticGraph parse,
IndexedWord root,
java.util.Optional<java.lang.String> noopArc)
Get the yield of a given subtree, if it is a adverb chunk.
|
protected java.util.Optional<java.util.List<IndexedWord>> |
getValidChunk(SemanticGraph parse,
IndexedWord originalRoot,
java.util.Set<java.lang.String> validArcs,
java.util.Optional<java.lang.String> ignoredArc) |
protected java.util.Optional<java.util.List<IndexedWord>> |
getValidChunk(SemanticGraph parse,
IndexedWord originalRoot,
java.util.Set<java.lang.String> validArcs,
java.util.Optional<java.lang.String> ignoredArc,
boolean allowExtraArcs) |
protected java.util.Optional<java.util.List<IndexedWord>> |
getValidObjectChunk(SemanticGraph parse,
IndexedWord root,
java.util.Optional<java.lang.String> noopArc)
Get the yield of a given subtree, if it is a valid object.
|
protected java.util.Optional<java.util.List<IndexedWord>> |
getValidSubjectChunk(SemanticGraph parse,
IndexedWord root,
java.util.Optional<java.lang.String> noopArc)
Get the yield of a given subtree, if it is a valid subject.
|
java.util.Optional<RelationTriple> |
segment(SemanticGraph parse,
java.util.Optional<java.lang.Double> confidence)
Segment the given parse tree, forcing all nodes to be consumed.
|
java.util.Optional<RelationTriple> |
segment(SemanticGraph parse,
java.util.Optional<java.lang.Double> confidence,
boolean consumeAll)
This is the main entry point from the Annotator.
|
public final java.util.List<SemgrexPattern> VERB_PATTERNS
public final java.util.List<SemgrexPattern> VP_PATTERNS
A set of derivative patterns from VERB_PATTERNS
that ignore the subject
arc. This is useful primarily for creating a training set for the clause splitter which emulates the
behavior of the relation triple segmenter component.
public final java.util.List<TokenSequencePattern> NOUN_TOKEN_PATTERNS
public final java.util.Set<java.lang.String> VALID_SUBJECT_ARCS
public final java.util.Set<java.lang.String> VALID_OBJECT_ARCS
public final java.util.Set<java.lang.String> VALID_ADVERB_ARCS
public RelationTripleSegmenter(boolean allowNominalsWithoutNER)
allowNominalsWithoutNER
- If true, extract all nominal relations and not just those which are warranted based on
named entity tags. For most practical applications, this greatly over-produces trivial triples.public RelationTripleSegmenter()
RelationTripleSegmenter(boolean)
public java.util.List<RelationTriple> extract(SemanticGraph parse, java.util.List<CoreLabel> tokens)
parse
- The parse tree of the sentence to annotate.tokens
- The tokens of the sentence to annotate.RelationTriple
s. Note that these do not have an associated tree with them.NOUN_TOKEN_PATTERNS
,
NOUN_DEPENDENCY_PATTERNS
protected java.util.Optional<java.util.List<IndexedWord>> getValidChunk(SemanticGraph parse, IndexedWord originalRoot, java.util.Set<java.lang.String> validArcs, java.util.Optional<java.lang.String> ignoredArc, boolean allowExtraArcs)
getValidSubjectChunk(edu.stanford.nlp.semgraph.SemanticGraph, edu.stanford.nlp.ling.IndexedWord, Optional)
,
getValidObjectChunk(edu.stanford.nlp.semgraph.SemanticGraph, edu.stanford.nlp.ling.IndexedWord, Optional)
,
getValidAdverbChunk(edu.stanford.nlp.semgraph.SemanticGraph, edu.stanford.nlp.ling.IndexedWord, Optional)
protected java.util.Optional<java.util.List<IndexedWord>> getValidChunk(SemanticGraph parse, IndexedWord originalRoot, java.util.Set<java.lang.String> validArcs, java.util.Optional<java.lang.String> ignoredArc)
protected java.util.Optional<java.util.List<IndexedWord>> getValidSubjectChunk(SemanticGraph parse, IndexedWord root, java.util.Optional<java.lang.String> noopArc)
Optional.empty()
}.parse
- The parse tree we are extracting a subtree from.root
- The root of the subtree.noopArc
- An optional edge type to ignore in gathering the chunk.protected java.util.Optional<java.util.List<IndexedWord>> getValidObjectChunk(SemanticGraph parse, IndexedWord root, java.util.Optional<java.lang.String> noopArc)
Optional.empty()
}.parse
- The parse tree we are extracting a subtree from.root
- The root of the subtree.noopArc
- An optional edge type to ignore in gathering the chunk.protected java.util.Optional<java.util.List<IndexedWord>> getValidAdverbChunk(SemanticGraph parse, IndexedWord root, java.util.Optional<java.lang.String> noopArc)
Optional.empty()
}.parse
- The parse tree we are extracting a subtree from.root
- The root of the subtree.noopArc
- An optional edge type to ignore in gathering the chunk.public java.util.Optional<RelationTriple> segment(SemanticGraph parse, java.util.Optional<java.lang.Double> confidence, boolean consumeAll)
This is the main entry point from the Annotator.
Tries to segment this sentence as a relation triple. This sentence must already match one of a few strict patterns for a valid OpenIE extraction. If it does not, then no relation triple is created. That is, this is not a relation extractor; it is just a utility to segment what is already a (subject, relation, object) triple into these three parts.
Relations are verified using semgrex expressions. For example, look at VERB_PATTERNS for a list of semgrex expressions involving verbs.
Once a relation is potentially here, this method goes through some pruning steps to eliminate invalid relations. For example, if one of the get clauses contains a NOT or similar word, we eliminate that, since the system has not been written to handle negation. Other possible eliminations are for having arcs which were not expected as part of the semgrex expression used to identify the triple.
This method will attempt to use both the verb-centric patterns and the ACL-centric patterns.
parse
- The sentence to process, as a dependency tree.confidence
- An optional confidence to pass on to the relation triple.consumeAll
- if true, force the entire parse to be consumed by the pattern.public java.util.Optional<RelationTriple> segment(SemanticGraph parse, java.util.Optional<java.lang.Double> confidence)