Stanford Temporal Tagger: SUTime for JAVA

SUTIME, is a temporal tagger for recognizing and normalizing temporal expressions in English text. SUTIME is used to annotate documents with temporal information. It is a deterministic rule-based system.


The extraction of temporal information from text is increasingly apparent in NLP applications, such as information extraction and question answering. Along with extracting the entities like PERSON, LOCATION, MONEY we can extract temporal range from a text.


Temporal extraction example for given text:
The theme of World Health Day, marked on 7 April 2013, was “Healthy Heart Beat, Healthy Blood Pressure”.

"We can extract the following temporal information"

ExpressionTypeSimplified Value
DayDATEP1D
7 April 2013DATE2013-04-07
To extract such temporal information requires to recognize temporal expressions, and convert them to a normalized form.  SUTime is a rule-based temporal tagger built on regular expression patterns.

Temporal expressions extraction from the text:

For a given text, after tokenization SUTime library finds temporal expressions and outputs annotations. These annotations are in the form of TIMEX3 tags. TIMEX3 is part of the TimeML annotation language for markup of events, times and their temporal relations in documents.


Types of Temporal Expressions
SUTime supports four basic types of temporal: Time, Duration, Date and Set.

1. TIME: indicating a particular instance on a time scale.

e.g. (with 2015-08-28 as reference date)
“Next Sunday 2 pm” => 2015-09-06T14:00

2. DURATION: the amount of time between the two end-points of a time interval.

e.g. (with 2015-08-28 as reference date)
“2 weeks” => P2W => 2015-08-28 to 2015-09-11

3. DATE: particular date
e.g. “On 7 April 2013” => 2013-04-07

4. SET: periodic temporal sets representing times that occur with some frequency.

e.g. “Every Tuesday” => XXXX-WXX-2 (2nd day of week)

We will use Stanford Temporal SUTime library for temporal information extraction using java.

Required library and rule files:
Libraries:

  • stanford-corenlp-[version].jar
Maven Dependency

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.5.1</version>
</dependency>

Rule files:

  • SUTime rule files
  • sutime.txt
  • sutime.txt
  • holidays.sutime.txt

SUTime processing:
1. Initialize Annotation Pipeline object (Load rule files for SUTime)

import org.joda.time.DateTime;
import org.joda.time.Period;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.AnnotationPipeline;
import edu.stanford.nlp.pipeline.TokenizerAnnotator;
import edu.stanford.nlp.time.SUTime;
import edu.stanford.nlp.time.SUTime.Temporal;
import edu.stanford.nlp.time.TimeAnnotations;
import edu.stanford.nlp.time.TimeAnnotator;
import edu.stanford.nlp.time.TimeExpression;
import edu.stanford.nlp.util.CoreMap;


static AnnotationPipeline pipeline = null;

private static void setup(NERTemporalModel model) {
    try {
        String defs_sutime = "/home/sutime/defs.sutime.txt";
        String holiday_sutime = "/home/sutime/english.holidays.sutime.txt";
        String _sutime = "/home/sutime/english.sutime.txt";
        pipeline = new AnnotationPipeline();
        Properties props = new Properties();
        String sutimeRules = defs_sutime + "," + holiday_sutime +
            "," + _sutime;
        props.setProperty("sutime.rules", sutimeRules);
        props.setProperty("sutime.binders", "0");
        props.setProperty("sutime.markTimeRanges", "true");
        props.setProperty("sutime.includeRange", "true");
        pipeline.addAnnotator(new TokenizerAnnotator(false));
        pipeline.addAnnotator(new TimeAnnotator("sutime", props));
    } catch (Exception e) {
        e.printStackTrace();
    }
}


2. Extract temporal information

Input text:
“From next month, we will have meeting on every friday, from 3:00 pm to 4:00 pm.”

public void annotateText(String text, String referenceDate) {
    try {
        if (referenceDate == null || referenceDate.isEmpty()) {
            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
            referenceDate = dateFormat.format(new Date());
        } else {
            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
            try {
                dateFormat.parse(referenceDate);
            } catch (Exception e) {
                referenceDate = dateFormat.format(new Date());
            }
        }
        if (pipeline != null) {
            Annotation annotation = new Annotation(text);
            annotation.set(CoreAnnotations.DocDateAnnotation.class, referenceDate);
            pipeline.annotate(annotation);
            List < CoreMap > timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
            for (CoreMap cm: timexAnnsAll) {
                try {
                    List < CoreLabel > tokens = cm.get(CoreAnnotations.TokensAnnotation.class);
                    String startOffset = tokens.get(0).get(CoreAnnotations.CharacterOffsetBeginAnnotation.class).toString();
                    String endOffset = tokens.get(tokens.size() - 1).get(CoreAnnotations.CharacterOffsetEndAnnotation.class).toString();
                    Temporal temporal = cm.get(TimeExpression.Annotation.class).getTemporal();
                    System.out.println("Token text : " + cm.toString());
                    System.out.println("Temporal Value : " + temporal.toString());
                    System.out.println("Timex : " + temporal.getTimexValue());
                    System.out.println("Timex type : " + temporal.getTimexType().name());
                    System.out.println("Start offset : " + startOffset);
                    System.out.println("End Offset : " + endOffset);
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        } else {
            System.out.println("Annotation Pipeline object is NULL");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Extracted temporal information is:

ExpressionTimex TypeTemporal ValueTimexStart offsetEnd Offset
next monthDATE2015-092015-09515
every fridaySETXXXX-WXX-5XXXX-WXX-54153
from 3:00 pm to 4:00 pmDURATION(T15:00,T16:00,PT1H)PT1H5578
Write a comment
Cancel Reply
  • Chandu September 14, 2015, 9:23 am
    Good one.
    reply