public class SentenceAlgorithms
extends java.lang.Object
A set of common utility algorithms for working with sentences (e.g., finding the head of a span). These are not intended to be perfect, or even the canonical version of these algorithms. They should only be trusted for prototyping, and more careful attention should be paid in cases where the performance of the task is important or the domain is unusual.
For developers: this class is intended to be where domain independent and
broadly useful functions on a sentence would go, rather than polluting the Sentence
class itself.
Modifier and Type | Field and Description |
---|---|
Sentence |
sentence
The underlying
Sentence . |
Constructor and Description |
---|
SentenceAlgorithms(Sentence impl)
Create a new algorithms object, based off of a sentence.
|
Modifier and Type | Method and Description |
---|---|
java.lang.Iterable<java.util.List<java.lang.String>> |
allSpans() |
<E> java.lang.Iterable<java.util.List<E>> |
allSpans(java.util.function.Function<Sentence,java.util.List<E>> selector) |
<E> java.lang.Iterable<java.util.List<E>> |
allSpans(java.util.function.Function<Sentence,java.util.List<E>> selector,
int maxLength)
Return all the spans of a sentence.
|
java.util.List<java.lang.String> |
dependencyPathBetween(int start,
int end) |
java.util.List<java.lang.String> |
dependencyPathBetween(int start,
int end,
java.util.Optional<java.util.function.Function<Sentence,java.util.List<java.lang.String>>> selector)
Find the dependency path between two words in a sentence.
|
int |
headOfSpan(Span tokenSpan)
Get the index of the head word for a given span, based off of the dependency parse.
|
java.util.List<java.lang.String> |
keyphrases()
The keyphrases of the sentence, using the words of the sentence to convert a span into a keyphrase.
|
java.util.List<java.lang.String> |
keyphrases(java.util.function.Function<Sentence,java.util.List<java.lang.String>> toString)
Get the keyphrases of the sentence as a list of Strings.
|
java.util.List<Span> |
keyphraseSpans()
Returns a collection of keyphrases, defined as relevant noun phrases and verbs in the sentence.
|
protected java.util.List<java.lang.String> |
loopyDependencyPathBetween(int start,
int end,
java.util.Optional<java.util.function.Function<Sentence,java.util.List<java.lang.String>>> selector)
Run a proper BFS over a dependency graph, finding the shortest path between two vertices.
|
<E> E |
modeInSpan(Span span,
java.util.function.Function<Sentence,java.util.List<E>> selector)
Select the most common element of the given type in the given span.
|
void |
unescapeHTML()
A funky little helper method to interpret each token of the sentence as an HTML string, and translate it back to text.
|
public SentenceAlgorithms(Sentence impl)
Sentence.algorithms()
public java.util.List<Span> keyphraseSpans()
public java.util.List<java.lang.String> keyphrases(java.util.function.Function<Sentence,java.util.List<java.lang.String>> toString)
toString
- The function to use to convert a span to a string. The canonical case is Sentence::wordskeyphraseSpans()
public java.util.List<java.lang.String> keyphrases()
keyphraseSpans()
public int headOfSpan(Span tokenSpan)
tokenSpan
- The span of tokens we are finding the head of.public <E> java.lang.Iterable<java.util.List<E>> allSpans(java.util.function.Function<Sentence,java.util.List<E>> selector, int maxLength)
E
- The type of the element we are getting.selector
- The function to apply to each token. For example, Sentence.words()
.
For that example, you can use allSpans(Sentence::words)
.maxLength
- The maximum length of the spans to extract. The default to extract all spans
is to set this to sentence.length()
.public <E> java.lang.Iterable<java.util.List<E>> allSpans(java.util.function.Function<Sentence,java.util.List<E>> selector)
allSpans(Function, int)
public java.lang.Iterable<java.util.List<java.lang.String>> allSpans()
allSpans(Function, int)
public <E> E modeInSpan(Span span, java.util.function.Function<Sentence,java.util.List<E>> selector)
E
- The type of the element we are getting.span
- The span of the sentence to find the mode element in. This must be entirely contained in the sentence.selector
- The property of the sentence we are getting the mode of. For example, Sentence::posTags
protected java.util.List<java.lang.String> loopyDependencyPathBetween(int start, int end, java.util.Optional<java.util.function.Function<Sentence,java.util.List<java.lang.String>>> selector)
start
- The start index.end
- The end index.selector
- The selector to use for the word nodes.dependencyPathBetween(int, int)
public java.util.List<java.lang.String> dependencyPathBetween(int start, int end, java.util.Optional<java.util.function.Function<Sentence,java.util.List<java.lang.String>>> selector)
start
- The start word, 0-indexed.end
- The end word, 0-indexed.selector
- The selector for the strings between the path, if any. If left empty, these will be omitted from the list.public java.util.List<java.lang.String> dependencyPathBetween(int start, int end)
public void unescapeHTML()