Stanford Temporal Tagger: SUTime for JAVA


By: Sagar Gole | August 28, 2015

Stanford Temporal Tagger using SUTime for JAVA

Introduction

SUTIME, is a temporal tagger for recognizing and normalizing temporal expressions in English text. SUTIME is used to annotate documents with temporal information. It is a deterministic rule-based system.

The extraction of temporal information from text is increasingly apparent in NLP applications, such as information extraction and question answering. Along with extracting the entities like PERSON, LOCATION, MONEY we can extract temporal range from a text.

Temporal extraction example for given text:

The theme of World Health Day, marked on 7 April 2013, was “Healthy Heart Beat, Healthy Blood Pressure”.

We can extract the following temporal information

Expression Type Simplified Value
Day DATE P1D
7 April 2013 DATE 2013-04-07

To extract such temporal information, requires to recognize temporal expressions, and convert them to a normalized form.  SUTime is a rule-based temporal tagger built on regular expression patterns.

Temporal expressions extraction from text:

For a given text, after tokenization SUTime library finds temporal expressions and outputs annotations. These annotations are in the form of TIMEX3 tags. TIMEX3 is part of the TimeML annotation language for markup of events, times and their temporal relations in documents.

Types of Temporal Expressions

SUTime supports four basic types of temporal: Time, Duration, Date and Set.

1. TIME: indicating a particular instance on a time scale.

e.g. (with 2015-08-28 as reference date)
“Next Sunday 2 pm” => 2015-09-06T14:00


2. DURATION: the amount of time between the two end-points of a time interval.

e.g. (with 2015-08-28 as reference date)
“2 weeks” => P2W => 2015-08-28 to 2015-09-11


3. DATE: particular date

e.g. “On 7 April 2013” => 2013-04-07


4. SET: periodic temporal sets representing times that occur with some frequency.

e.g. “Every Tuesday” => XXXX-WXX-2 (2nd day of week)


We will use Stanford Temporal SUTime library for temporal information extraction using java.

Required library and rule files:

Libraries:

  • stanford-corenlp-[version].jar

Maven Dependency

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.5.1</version>
</dependency>

Rule files:

  • SUTime rule files
    • sutime.txt
    • sutime.txt
    • holidays.sutime.txt

SUTime processing:

  1. Initialize Annotation Pipeline object (Load rule files for SUTime)
import org.joda.time.DateTime;
import org.joda.time.Period;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.AnnotationPipeline;
import edu.stanford.nlp.pipeline.TokenizerAnnotator;
import edu.stanford.nlp.time.SUTime;
import edu.stanford.nlp.time.SUTime.Temporal;
import edu.stanford.nlp.time.TimeAnnotations;
import edu.stanford.nlp.time.TimeAnnotator;
import edu.stanford.nlp.time.TimeExpression;
import edu.stanford.nlp.util.CoreMap;


static AnnotationPipeline pipeline = null;
    
private static void setup(NERTemporalModel model) {
	try {
		String defs_sutime = "/home/sutime/defs.sutime.txt";
		String holiday_sutime = "/home/sutime/english.holidays.sutime.txt";
		String _sutime = "/home/sutime/english.sutime.txt";
		pipeline = new AnnotationPipeline();
		Properties props = new Properties();
		String sutimeRules = defs_sutime + "," + holiday_sutime
					+ "," + _sutime;
		props.setProperty("sutime.rules", sutimeRules);
		props.setProperty("sutime.binders", "0");
		props.setProperty("sutime.markTimeRanges", "true");
		props.setProperty("sutime.includeRange", "true");
		pipeline.addAnnotator(new TokenizerAnnotator(false));
		pipeline.addAnnotator(new TimeAnnotator("sutime", props));
	} catch (Exception e) {
		e.printStackTrace();
	}
}
  1. Extract temporal information

Input text:

“From next month, we will have meeting on every friday, from 3:00 pm to 4:00 pm.”

public void annotateText(String text, String referenceDate) {
    try {
		if (referenceDate == null || referenceDate.isEmpty()) {
			SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
			referenceDate = dateFormat.format(new Date());
		} else {
			SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
			try {
				dateFormat.parse(referenceDate);
			} catch (Exception e) {
				referenceDate = dateFormat.format(new Date());
			}
		}
		if (pipeline != null) {
			Annotation annotation = new Annotation(text);
			annotation.set(CoreAnnotations.DocDateAnnotation.class,	referenceDate);
			pipeline.annotate(annotation);
			List<CoreMap> timexAnnsAll = annotation
				.get(TimeAnnotations.TimexAnnotations.class);
			for (CoreMap cm : timexAnnsAll) {
				try {
					List<CoreLabel> tokens = cm
							.get(CoreAnnotations.TokensAnnotation.class);
					String startOffset = tokens
								.get(0)
					.get(CoreAnnotations.CharacterOffsetBeginAnnotation.class)
								.toString();

					String endOffset = tokens
								.get(tokens.size() - 1)
					.get(CoreAnnotations.CharacterOffsetEndAnnotation.class)
								.toString();

					Temporal temporal = cm.get(
						TimeExpression.Annotation.class).getTemporal();

					System.out.println("Token text : " + cm.toString());
					System.out.println("Temporal Value : "
								+ temporal.toString());
					System.out.println("Timex : "
								+ temporal.getTimexValue());
					System.out.println("Timex type : "
								+ temporal.getTimexType().name());
					System.out.println("Start offset : " + startOffset);
					System.out.println("End Offset : " + endOffset);

				} catch (Exception e) {
					e.printStackTrace();
				}
			}
		} else {
			System.out.println("Annotation Pipeline object is NULL");
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
} 

Extracted temporal information is:

Expression Timex Type Temporal Value Timex Start offset End Offset
next month DATE 2015-09 2015-09 5 15
every friday SET XXXX-WXX-5 XXXX-WXX-5 41 53
from 3:00 pm to 4:00 pm DURATION (T15:00,T16:00,PT1H) PT1H 55 78

 

This post has been viewed 5,639 times

One thought on “Stanford Temporal Tagger: SUTime for JAVA

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>