Part-of-speech tagging using OpenNLP
Useful in:
- Information retrieval
- Word sense disambiguation.
Apache Opennlp project provides model files for POS tagging, sentence detection, sentence tokenization, sentence chunker and many more…
We will use POS tagging and sentence detection model files.
Required libraries and model files:
Libraries:
- opennlp-maxnet-3.0.3.jar
- opennlp-tools-1.5.3.jar
Model files:
- POS tagger – en-pos-maxent.bin
- Sentence Detector – en-sent.bin
Input Paragraph:
Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.
POS Tagging:
- Load POS Model file
- Call sentence detector method and get the sentence array
- Generate tokens for each sentence
- Tag these tokens using POSTaggerME
POS Tagging Code:
import opennlp.tools.*;
String DIR_PATH = "opennlp/";
String POS_MODEL_FILE = "en-pos-maxent.bin";
String paragraph = "Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.";
POSModel model = null;
POSTagger posInstance = null;
// Load POS Model
try {
File inputFile = null;
String fileLocation = DIR_PATH + POS_MODEL_FILE;
if (new File(fileLocation).exists()) {
inputFile = new File(fileLocation);
} else {
System.out.println("File : " + fileLocation + " does not exists.");
}
if (inputFile != null) {
model = new POSModelLoader().load(inputFile);
}
} catch (Exception e) {
e.printStackTrace();
}
// POS Tagging
try {
if (model != null) {
POSTaggerME tagger = new POSTaggerME(model);
if (tagger != null) {
// Call Sentence Detector
String[] sentences = getSentences(input);
for (String sentence: sentences) {
System.out.println("Sentence : " + sentence);
}
for (String sentence: sentences) {
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
.tokenize(sentence);
String[] tags = tagger.tag(whitespaceTokenizerLine);
for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
String word = whitespaceTokenizerLine[i].trim();
String tag = tags[i].trim();
System.out.println(word + ":" + tag);
}
}
}
tagger = null;
}
} catch (Exception e) {
e.printStackTrace();
}
Sentence Detector:
- Take a paragraph as input and returns array of sentence.
Input paragraph to sentence detector is:
“Along with fresh veggies and fruits, eat nuts, seeds and salads. Make sure you get a balanced diet, as often as possible. Drinking water is good for your internal organs, it keeps you fresh and healthy.”;
import opennlp.tools.sentdetect.*;
String DIR_PATH = "opennlp/";
String sentenceModel = "en-sent.bin";
SentenceModel model = null;
// Load model object
try {
File inputFile = null;
String fileLocation = DIR_PATH + sentenceModel;
if (new File(fileLocation).exists()) {
inputFile = new File(fileLocation);
} else {
System.out.println("File : " + fileLocation + " does not exists.");
}
if (inputFile != null) {
InputStream is = new FileInputStream(inputFile);
model = new SentenceModel(is);
is.close();
}
} catch (Exception e) {
e.printStackTrace();
}
// Sentence detection
try {
if (model != null) {
SentenceDetectorME sdetector = new SentenceDetectorME(model);
String[] sentences = sdetector.sentDetect(paragraph);
}
} catch (Exception e) {
e.printStackTrace();
}
Sentence Detector Output:
- Along with fresh veggies and fruits, eat nuts, seeds and salads.
- Make sure you get a balanced diet, as often as possible.
- Drinking water is good for your internal organs, it keeps you fresh and healthy.
POS Tagger Output:
POS Tags and their meanings