Stanford Temporal Tagger: SUTime for JAVA

Added:28 Aug 2015
Author:Sagar Gole
Views:13158

SUTIME, is a temporal tagger for recognizing and normalizing temporal expressions in English text. SUTIME is used to annotate documents with temporal information. It is a deterministic rule-based system.

The extraction of temporal information from text is increasingly apparent in NLP applications, such as information extraction and question answering. Along with extracting the entities like PERSON, LOCATION, MONEY we can extract temporal range from a text.

Temporal extraction example for given text:

The theme of World Health Day, marked on 7 April 2013, was “Healthy Heart Beat, Healthy Blood Pressure”.

"We can extract the following temporal information"

Expression	Type	Simplified Value
Day	DATE	P1D
7 April 2013	DATE	2013-04-07

To extract such temporal information requires to recognize temporal expressions, and convert them to a normalized form. SUTime is a rule-based temporal tagger built on regular expression patterns.

Temporal expressions extraction from the text:

For a given text, after tokenization SUTime library finds temporal expressions and outputs annotations. These annotations are in the form of TIMEX3 tags. TIMEX3 is part of the TimeML annotation language for markup of events, times and their temporal relations in documents.

Types of Temporal Expressions

SUTime supports four basic types of temporal: Time, Duration, Date and Set.

1. TIME: indicating a particular instance on a time scale.

e.g. (with 2015-08-28 as reference date)

“Next Sunday 2 pm” => 2015-09-06T14:00

2. DURATION: the amount of time between the two end-points of a time interval.

e.g. (with 2015-08-28 as reference date)

“2 weeks” => P2W => 2015-08-28 to 2015-09-11

3. DATE: particular date

e.g. “On 7 April 2013” => 2013-04-07

4. SET: periodic temporal sets representing times that occur with some frequency.

e.g. “Every Tuesday” => XXXX-WXX-2 (2nd day of week)

We will use Stanford Temporal SUTime library for temporal information extraction using java.

Required library and rule files:

Libraries:

stanford-corenlp-[version].jar

Maven Dependency

<groupId>edu.stanford.nlp</groupId>

<artifactId>stanford-corenlp</artifactId>

</dependency>

Rule files:

SUTime rule files
sutime.txt
sutime.txt
holidays.sutime.txt

SUTime processing:

1. Initialize Annotation Pipeline object (Load rule files for SUTime)

import org.joda.time.DateTime;
import org.joda.time.Period;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.AnnotationPipeline;
import edu.stanford.nlp.pipeline.TokenizerAnnotator;
import edu.stanford.nlp.time.SUTime;
import edu.stanford.nlp.time.SUTime.Temporal;
import edu.stanford.nlp.time.TimeAnnotations;
import edu.stanford.nlp.time.TimeAnnotator;
import edu.stanford.nlp.time.TimeExpression;
import edu.stanford.nlp.util.CoreMap;

static AnnotationPipeline pipeline = null;

private static void setup(NERTemporalModel model) {
    try {
        String defs_sutime = "/home/sutime/defs.sutime.txt";
        String holiday_sutime = "/home/sutime/english.holidays.sutime.txt";
        String _sutime = "/home/sutime/english.sutime.txt";
        pipeline = new AnnotationPipeline();
        Properties props = new Properties();
        String sutimeRules = defs_sutime + "," + holiday_sutime +
            "," + _sutime;
        props.setProperty("sutime.rules", sutimeRules);
        props.setProperty("sutime.binders", "0");
        props.setProperty("sutime.markTimeRanges", "true");
        props.setProperty("sutime.includeRange", "true");
        pipeline.addAnnotator(new TokenizerAnnotator(false));
        pipeline.addAnnotator(new TimeAnnotator("sutime", props));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

2. Extract temporal information

Input text:

“From next month, we will have meeting on every friday, from 3:00 pm to 4:00 pm.”

public void annotateText(String text, String referenceDate) {
    try {
        if (referenceDate == null || referenceDate.isEmpty()) {
            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
            referenceDate = dateFormat.format(new Date());
        } else {
            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
            try {
                dateFormat.parse(referenceDate);
            } catch (Exception e) {
                referenceDate = dateFormat.format(new Date());
            }
        }
        if (pipeline != null) {
            Annotation annotation = new Annotation(text);
            annotation.set(CoreAnnotations.DocDateAnnotation.class, referenceDate);
            pipeline.annotate(annotation);
            List < CoreMap > timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
            for (CoreMap cm: timexAnnsAll) {
                try {
                    List < CoreLabel > tokens = cm.get(CoreAnnotations.TokensAnnotation.class);
                    String startOffset = tokens.get(0).get(CoreAnnotations.CharacterOffsetBeginAnnotation.class).toString();
                    String endOffset = tokens.get(tokens.size() - 1).get(CoreAnnotations.CharacterOffsetEndAnnotation.class).toString();
                    Temporal temporal = cm.get(TimeExpression.Annotation.class).getTemporal();
                    System.out.println("Token text : " + cm.toString());
                    System.out.println("Temporal Value : " + temporal.toString());
                    System.out.println("Timex : " + temporal.getTimexValue());
                    System.out.println("Timex type : " + temporal.getTimexType().name());
                    System.out.println("Start offset : " + startOffset);
                    System.out.println("End Offset : " + endOffset);
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        } else {
            System.out.println("Annotation Pipeline object is NULL");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Extracted temporal information is:

Expression	Timex Type	Temporal Value	Timex	Start offset	End Offset
next month	DATE	2015-09	2015-09	5	15
every friday	SET	XXXX-WXX-5	XXXX-WXX-5	41	53
from 3:00 pm to 4:00 pm	DURATION	(T15:00,T16:00,PT1H)	PT1H	55	78

Tags:

Write a comment

Chandu September 14, 2015, 9:23 am
Good one.
reply

Stanford Temporal Tagger: SUTime for JAVA

Write a comment

Search

Author's Recent Posts

Categories

Stanford Temporal Tagger: SUTime for JAVA

Write a comment

Search

Subscribe Us

Author's Recent Posts

Categories

Subscribe To Our Newsletter