Stanford Temporal Tagger: SUTime for JAVA
SUTIME, is a temporal tagger for recognizing and normalizing temporal expressions in English text. SUTIME is used to annotate documents with temporal information. It is a deterministic rule-based system.
The extraction of temporal information from text is increasingly apparent in NLP applications, such as information extraction and question answering. Along with extracting the entities like PERSON, LOCATION, MONEY we can extract temporal range from a text.
Expression | Type | Simplified Value |
Day | DATE | P1D |
7 April 2013 | DATE | 2013-04-07 |
For a given text, after tokenization SUTime library finds temporal expressions and outputs annotations. These annotations are in the form of TIMEX3 tags. TIMEX3 is part of the TimeML annotation language for markup of events, times and their temporal relations in documents.
- stanford-corenlp-[version].jar
- SUTime rule files
- sutime.txt
- sutime.txt
- holidays.sutime.txt
import org.joda.time.DateTime;
import org.joda.time.Period;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.AnnotationPipeline;
import edu.stanford.nlp.pipeline.TokenizerAnnotator;
import edu.stanford.nlp.time.SUTime;
import edu.stanford.nlp.time.SUTime.Temporal;
import edu.stanford.nlp.time.TimeAnnotations;
import edu.stanford.nlp.time.TimeAnnotator;
import edu.stanford.nlp.time.TimeExpression;
import edu.stanford.nlp.util.CoreMap;
static AnnotationPipeline pipeline = null;
private static void setup(NERTemporalModel model) {
try {
String defs_sutime = "/home/sutime/defs.sutime.txt";
String holiday_sutime = "/home/sutime/english.holidays.sutime.txt";
String _sutime = "/home/sutime/english.sutime.txt";
pipeline = new AnnotationPipeline();
Properties props = new Properties();
String sutimeRules = defs_sutime + "," + holiday_sutime +
"," + _sutime;
props.setProperty("sutime.rules", sutimeRules);
props.setProperty("sutime.binders", "0");
props.setProperty("sutime.markTimeRanges", "true");
props.setProperty("sutime.includeRange", "true");
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
} catch (Exception e) {
e.printStackTrace();
}
}
public void annotateText(String text, String referenceDate) {
try {
if (referenceDate == null || referenceDate.isEmpty()) {
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
referenceDate = dateFormat.format(new Date());
} else {
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
try {
dateFormat.parse(referenceDate);
} catch (Exception e) {
referenceDate = dateFormat.format(new Date());
}
}
if (pipeline != null) {
Annotation annotation = new Annotation(text);
annotation.set(CoreAnnotations.DocDateAnnotation.class, referenceDate);
pipeline.annotate(annotation);
List < CoreMap > timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm: timexAnnsAll) {
try {
List < CoreLabel > tokens = cm.get(CoreAnnotations.TokensAnnotation.class);
String startOffset = tokens.get(0).get(CoreAnnotations.CharacterOffsetBeginAnnotation.class).toString();
String endOffset = tokens.get(tokens.size() - 1).get(CoreAnnotations.CharacterOffsetEndAnnotation.class).toString();
Temporal temporal = cm.get(TimeExpression.Annotation.class).getTemporal();
System.out.println("Token text : " + cm.toString());
System.out.println("Temporal Value : " + temporal.toString());
System.out.println("Timex : " + temporal.getTimexValue());
System.out.println("Timex type : " + temporal.getTimexType().name());
System.out.println("Start offset : " + startOffset);
System.out.println("End Offset : " + endOffset);
} catch (Exception e) {
e.printStackTrace();
}
}
} else {
System.out.println("Annotation Pipeline object is NULL");
}
} catch (Exception e) {
e.printStackTrace();
}
}
Expression | Timex Type | Temporal Value | Timex | Start offset | End Offset |
next month | DATE | 2015-09 | 2015-09 | 5 | 15 |
every friday | SET | XXXX-WXX-5 | XXXX-WXX-5 | 41 | 53 |
from 3:00 pm to 4:00 pm | DURATION | (T15:00,T16:00,PT1H) | PT1H | 55 | 78 |
Write a comment
- Chandu September 14, 2015, 9:23 amGood one.reply