Part-of-speech tagging using OpenNLP


By: Sagar Gole | June 18, 2015

Part of speech tagging using OpenNLP

Introduction

Part-of-speech tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context – i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.

Useful in:

  • Information retrieval
  • Word sense disambiguation.

Apache Opennlp project provides model files for POS tagging, sentence detection, sentence tokenization, sentence chunker and many more…

We will use POS tagging and sentence detection model files.

Required libraries and model files:

Libraries:

  • opennlp-maxnet-3.0.3.jar
  • opennlp-tools-1.5.3.jar

Model files:

  • POS tagger – en-pos-maxent.bin
  • Sentence Detector – en-sent.bin

Input Paragraph:

Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.

POS Tagging:

  • Load POS Model file
  • Call sentence detector method and get the sentence array
  • Generate tokens for each sentence
  • Tag these tokens using POSTaggerME

POS Tagging Code:

import opennlp.tools.*;

String DIR_PATH = "opennlp/";
String POS_MODEL_FILE = "en-pos-maxent.bin";

String paragraph = "Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.";

POSModel model = null;
POSTagger posInstance = null;

// Load POS Model
try {
    File inputFile = null;
	String fileLocation = DIR_PATH + POS_MODEL_FILE;
	if (new File(fileLocation).exists()) {
		inputFile = new File(fileLocation);
	} else {
		System.out.println("File : " + fileLocation + " does not exists.");
	}
	if (inputFile != null) {
		model = new POSModelLoader().load(inputFile);
	}
} catch (Exception e) {
	e.printStackTrace();
}

// POS Tagging
try {
	if (model != null) {
		POSTaggerME tagger = new POSTaggerME(model);
		if (tagger != null) {
			// Call Sentence Detector
			String[] sentences = getSentences(input);
			for (String sentence : sentences) {
				System.out.println("Sentence : " + sentence);
			}
			for (String sentence : sentences) {
				String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
								.tokenize(sentence);
				String[] tags = tagger.tag(whitespaceTokenizerLine);
				for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
					String word = whitespaceTokenizerLine[i].trim();
					String tag = tags[i].trim();
					System.out.println(word + ":" + tag);
				}
			}
		}
		tagger = null;
	}
} catch (Exception e) {
	e.printStackTrace();
}

Sentence Detector:

  • Take a paragraph as input and returns array of sentence.

Input paragraph to sentence detector is:

“Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.”;

import opennlp.tools.sentdetect.*;

String DIR_PATH = "opennlp/";
String sentenceModel = "en-sent.bin";
SentenceModel model = null;

// Load model object
try {
    File inputFile = null;
        String fileLocation = DIR_PATH + sentenceModel;
	if (new File(fileLocation).exists()) {
		inputFile = new File(fileLocation);
	} else {
		System.out.println("File : " + fileLocation + " does not exists.");
	}
	if (inputFile != null) {
		InputStream is = new FileInputStream(inputFile);
		model = new SentenceModel(is);
		is.close();
	}
} catch (Exception e) {
	e.printStackTrace();
}

// Sentence detection
try {
	if (model != null) {
		SentenceDetectorME sdetector = new SentenceDetectorME(model);
		String[] sentences = sdetector.sentDetect(paragraph);
	}
} catch (Exception e) {
	e.printStackTrace();
}

Sentence Detector Output:

  • Along with fresh veggies and fruits, eat nuts, seeds and salads.
  • Make sure you get a balanced diet, as often as possible.
  • Drinking water is good for your internal organs, it keeps you fresh and healthy.

POS Tagger Output:

POSTagger-Output

 

POS Tags and their meanings

POS-Tags

 

This post has been viewed 9,438 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>