About | Download | Usage | Extensions | Questions | Mailing lists | Online demo | Release history
SUTime is a library for recognizing and normalizing time expressions. That is, it will convert next wednesday at 3pm to something like 2016-02-17T15:00 (depending on the assumed current reference time). SUTime is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility. The rule set that we distribute supports only English, but other people have developed rule sets for other languages, such as Swedish.
SUTime was developed using TokensRegex, a generic framework for definining patterns over text and mapping to semantic objects. An included set of powerpoint slides and the javadoc for SUTime provide an overview of this package.
SUTime was written by Angel Chang. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.
There is a paper describing SUTime. You're encouraged to cite it if you use SUTime.
Angel X. Chang and Christopher D. Manning. 2012. SUTIME: A Library for Recognizing and Normalizing Time Expressions. 8th International Conference on Language Resources and Evaluation (LREC 2012).
SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner
annotator. When a time expression is identified, the NamedEntityTagAnnotation
is set with one of four temporal types (DATE
, TIME
, DURATION
, and SET
) and the NormalizedNamedEntityTagAnnotation
is set to the value of the normalized temporal expression. The temporal type and value corresponds to the TIMEX3 standard for type and value. (Note the slightly weird and non-specific entity name 'SET', which refers to a set of times, such as a recurring event.) For more details on the annotations, see also the
TimeML Annotation Guidelines Version 1.2.1,
Guidelines for Temporal Expression Annotation for English for TempEval 2010,
and the TIDES 2003 Standard for the Annotation of Temporal Expressions (TIMEX2 v1.3), which is still useful for its detailed discussion, even though partially superseded by TIMEX3. TIMEX3 is an extension of ISO 8601, and for the core cases of definite times, you're probably best off starting off by just reading about it.
SUTime also sets the TimexAnnotation
key to an edu.stanford.nlp.time.Timex
object, which contains the complete list of TIMEX3 fields for the corresponding expressions, such as "value", "tid", "type", "peridocity", "alt_value". This might be useful to developers interested in recovering complete TIMEX3 expressions. The field "alt_value" is our extension of TIMEX3. It is used when we can't give back a standard TIMEX value. It's typically used for unresolved dates – either because there was no reference date given or it was too complicated to resolve. For instance, "today" would give "THIS P1D" if there was no document date to resolve it against. It's like the "logical form" for the time if there is no denotation.
There is also a stand-alone SUTimeMain
class for invoking
SUTime. It can read certain temporal text data sets and can annotate
text files. It is mainly intended for validating the performance of SUTime.
When writing your own Java code, one way to use SUTime is just to use a CoreNLP pipeline. But you can also quite easily make your own custom annotation pipeline, if you only need the output of SUTime. Below is a complete example.
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;
public class SUTimeDemo {
/** Example usage:
* java SUTimeDemo "Three interesting dates are 18 Feb 1997, the 20th of july and 4 days from today."
*
* @param args Strings to interpret
*/
public static void main(String[] args) {
Properties props = new Properties();
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
for (String text : args) {
Annotation annotation = new Annotation(text);
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2013-07-14");
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm : timexAnnsAll) {
List<CoreLabel> tokens = cm.get(CoreAnnotations.TokensAnnotation.class);
System.out.println(cm + " [from char offset " +
tokens.get(0).get(CoreAnnotations.CharacterOffsetBeginAnnotation.class) +
" to " + tokens.get(tokens.size() - 1).get(CoreAnnotations.CharacterOffsetEndAnnotation.class) + ']' +
" --> " + cm.get(TimeExpression.Annotation.class).getTemporal());
}
System.out.println("--");
}
}
}
sutime/defs.sutime.txt, sutime/english.sutime.txt
)
sutime.rules = [path to rules file]Example:
sutime.rules = sutime/defs.sutime.txt, sutime/english.sutime.txt
sutime/defs.sutime.txt, sutime/english.sutime.txt
)
customAnnotatorClass.[name]=edu.stanford.nlp.time.TimeAnnotator [name].rules = [path to rules file]Example:
customAnnotatorClass.sutime=edu.stanford.nlp.time.TimeAnnotator sutime.rules = sutime/defs.sutime.txt, sutime/english.sutime.txt
java -cp stanford-corenlp-2012-05-22.jar:stanford-corenlp-2012-05-22-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,sutime -properties sutime.properties -file input.txt
java -Dpos.model=edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger -cp stanford-corenlp-2012-07-06.jar:stanford-corenlp-2012-07-09-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.time.SUTimeMain -in.type TEXTFILE -date <YYYY-MM-dd> -i <input.txt> -o <output file>
SUTime is integrated in the Stanford suite of NLP tools, StanfordCoreNLP. Please download the entire suite from this page.
Questions, feedback, and bug reports/fixes can be sent to our mailing lists.
We have 3 mailing lists for SUTime, all of which are shared
with other JavaNLP tools (with the exclusion of the parser). Each address is
at @lists.stanford.edu
:
java-nlp-user
This is the best list to post to in order
to ask questions, make announcements, or for discussion among JavaNLP
users. You have to subscribe to be able to use it.
Join the list via this webpage or by emailing
java-nlp-user-join@lists.stanford.edu
. (Leave the
subject and message body empty.) You can also
look at
the list archives.
java-nlp-announce
This list will be used only to announce
new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3
messages a year). Join the list via this webpage or by emailing
java-nlp-announce-join@lists.stanford.edu
. (Leave the
subject and message body empty.)
java-nlp-support
This list goes only to the software
maintainers. It's a good address for licensing questions, etc. For
general use and support questions, you're better off joining and using
java-nlp-user
.
You cannot join java-nlp-support
, but you can mail questions to
java-nlp-support@lists.stanford.edu
.
We have an online demo of SUTime.
Version 1.3.3 | 2012-07-09 | SUTimeMain supports annotation of text files |
Version 1.3.2 | 2012-05-22 | SUTime can be configured using rules |
Version 1.2.0 | 2011-09-14 | Initial version of SUTime time phrase recognizer added to NER annotator |