Stanford Temporal Tagger: SUTime

About | Download | Usage | Questions | Mailing lists | Online demo | Release history

About

SUTime is a library for recognizing and normalizing time expressions. SUTime is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.

SUTime was developed using TokensRegex, a generic framework for definining patterns over text and mapping to semantic objects. An included set of powerpoint slides and the javadoc for SUTime provide an overview of this package.

SUTime was written by Angel Chang. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.

There is a paper describing SUTime. You're encouraged to cite it if you use SUTime.

Angel X. Chang and Christopher D. Manning. 2012. SUTIME: A Library for Recognizing and Normalizing Time Expressions. 8th International Conference on Language Resources and Evaluation (LREC 2012).

Usage

SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner annotator. When a time expression is identified, the NamedEntityTagAnnotation is set with one of four temporal types (DATE, TIME, DURATION, and SET) and the NormalizedNamedEntityTagAnnotation is set to the value of the normalized temporal expression. The temporal type and value corresponds to the TIMEX3 standard for type and value. (Note the slightly weird and non-specific entity name 'SET', which refers to a set of times, such as a recurring event. For more details on the annotations, see also the TimeML Annotation Guidelines Version 1.2.1, Guidelines for Temporal Expression Annotation for English for TempEval 2010, and the TIDES 2003 Standard for the Annotation of Temporal Expressions (TIMEX2 v1.3), which is still useful for its detailed discussion, even though partially superseded by TIMEX3.

SUTime also sets the TimexAnnotation key to an edu.stanford.nlp.time.Timex object, which contains the complete list of TIMEX3 fields for the corresponding expressions, such as "val", "alt_val", "type", "tid". This might be useful to developers interested in recovering complete TIMEX3 expressions.

There is also a stand-alone SUTimeMain class for invoking SUTime. It can read certain temporal text data sets and can annotate text files. It is mainly intended for validating the performance of SUTime.

When writing your own Java code, one way to use SUTime is just to use a CoreNLP pipeline. But you can also quite easily make your own custom annotation pipeline, if you only need the output of SUTime. Below is a complete example.

import java.util.List;
import java.util.Properties;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;

public class SUTimeDemo {

  /** Example usage:
   *  java SUTimeDemo "Three interesting dates are 18 Feb 1997, the 20th
   of july and 4 days from today."
   *
   *  @param args Strings to interpret
   */
  public static void main(String[] args) {
    Properties props = new Properties();
    AnnotationPipeline pipeline = new AnnotationPipeline();
    pipeline.addAnnotator(new PTBTokenizerAnnotator(false));
    pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
    pipeline.addAnnotator(new POSTaggerAnnotator(false));
    pipeline.addAnnotator(new TimeAnnotator("sutime", props));

    for (String text : args) {
      Annotation annotation = new Annotation(text);
      annotation.set(CoreAnnotations.DocDateAnnotation.class, "2013-07-14");
      pipeline.annotate(annotation);
      System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
      List timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
      for (CoreMap cm : timexAnnsAll) {
        List tokens = cm.get(CoreAnnotations.TokensAnnotation.class);
        System.out.println(cm + " [from char offset " +
            tokens.get(0).get(CoreAnnotations.CharacterOffsetBeginAnnotation.class) +
            " to " + tokens.get(tokens.size() - 1).get(CoreAnnotations.CharacterOffsetEndAnnotation.class) + ']' +
            " --> " + cm.get(TimeExpression.Annotation.class).getTemporal());
      }
      System.out.println("--");
    }
  }

}

SUTime Rules

To extend SUTime rules, you can configure SUTime to use rules specified in files:
  1. Create rules file (see SequenceMatchRules for format of the rule file).
    Example: Sample English rules for SUTime are included in the distribution (sutime/defs.sutime.txt, sutime/english.sutime.txt)
  2. Configure the rules to be used by SUTime:
    sutime.rules = [path to rules file] 
    Example:
    sutime.rules = sutime/defs.sutime.txt, sutime/english.sutime.txt 

SUTime Annotator

To get annotations on a phrase level instead of on the token level, a separate TimeAnnotator is provided. To add a TimeAnnotator that uses rules to the pipeline:
  1. Create rules file (see SequenceMatchRules for format of the rule file).
    Example: Sample English rules for SUTime are included in the distribution (sutime/defs.sutime.txt, sutime/english.sutime.txt)
  2. Configure the TimeAnnotator
    customAnnotatorClass.[name]=edu.stanford.nlp.time.TimeAnnotator
    [name].rules = [path to rules file] 
    Example:
    customAnnotatorClass.sutime=edu.stanford.nlp.time.TimeAnnotator
    sutime.rules = sutime/defs.sutime.txt, sutime/english.sutime.txt 
  3. Add the annotator to the pipeline
    Example: java -cp stanford-corenlp-2012-05-22.jar:stanford-corenlp-2012-05-22-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,sutime -properties sutime.properties -file input.txt

Using SUTime to annotate a file with TIMEX3 tag

To annotate a text file with TIMEX3 tags:

Example: java -Dpos.model=edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger -cp stanford-corenlp-2012-07-06.jar:stanford-corenlp-2012-07-09-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.time.SUTimeMain -in.type TEXTFILE -date <YYYY-MM-dd> -i <input.txt> -o <output file>

Download

SUTime is integrated in the Stanford suite of NLP tools, StanfordCoreNLP. Please download the entire suite from this page.

Questions

Questions, feedback, and bug reports/fixes can be sent to our mailing lists.

Mailing Lists

We have 3 mailing lists for SUTime, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu:

  1. java-nlp-user This is the best list to post to in order to ask questions, make announcements, or for discussion among JavaNLP users. You have to subscribe to be able to use it. Join the list via this webpage or by emailing java-nlp-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
  2. java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 messages a year). Join the list via this webpage or by emailing java-nlp-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
  3. java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using java-nlp-user. You cannot join java-nlp-support, but you can mail questions to java-nlp-support@lists.stanford.edu.

Online Demo

We have an online demo of SUTime.

Release History

Version 1.3.3 2012-07-09 SUTimeMain supports annotation of text files
Version 1.3.2 2012-05-22 SUTime can be configured using rules
Version 1.2.0 2011-09-14 Initial version of SUTime time phrase recognizer added to NER annotator