Setting up WordNet using JWNL

WordNet is an Natural Language Processing (NLP) resource that is a lexical database for the English language.  WordNet is made up of groups or set of words that are synonyms of each other.  Each of these sets is called a synset.  Since a word can have more than one meaning, it can belong to more than one synset.  Using WordNet you can find not only a word's synonyms, but also it's hyponyms, hypernyms, holonyms, meronyms and antonyms. The database contains 155,287 words organized in 117,659 synsets for a total of 206,941 word-sense pairs.


There are many different libraries that can be used to with WordNet, JWNL is one of them. Setting it up in Eclipse with the help of Maven can be a breeze.


Here is a small step-by-step guide to get WordNet up and running with JWNL:


Step 1: Download the WordNet database files only

Navigate to https://wordnet.princeton.edu/wordnet/download/current-version/ and download the database files. The file will have a “.tar.gz” extension.  After unpacking it you will find the “dict” folder with all the database files for WordNet.  Place this within your java project folder and remember the path, you will need it in the next step.


Step 2: Create the Properties.XML file

This configuration file is used so that JWNL knows where to find WordNet and which version is being used. The code below is a minimalist setup and it can be further enhanced to use different stemmers. 


<?xml version="1.0" encoding="UTF-8"?>

<jwnl_properties language="en">

  <version publisher="Princeton" number="3.0" language="en"/>

  <dictionary class="net.didion.jwnl.dictionary.FileBackedDictionary">

    <param name="dictionary_element_factory" 

      value="net.didion.jwnl.princeton.data.PrincetonWN17FileDictionaryElementFactory"/>

    <param name="file_manager" value="net.didion.jwnl.dictionary.file_manager.FileManagerImpl">

      <param name="file_type" value="net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"/>

      <param name="dictionary_path" value="AddFullPathToDictionaryHERE"/>

    </param>

  </dictionary>

  <resource class="PrincetonResource"/>

</jwnl_properties>


Step 3: Adding the dependency to Maven

Add these two dependencies in your POM.xml which are menttioned below. This will add the necessary libraries to the build path for your project. 


<dependency> 
<groupId>net.didion.jwnl</groupId>
<artifactId>jwnl</artifactId>
<version>1.4.0.rc2</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>


Step 4: Boilerplate code

This is the boilerplate code that can be used.  We create a singleton instance of Dictionary class which can be used to query WordNet. Using the instance of dictionary we can query WordNet for a word using one of the four parts-of-Speech classes defined within WordNet, i.e. Noun, Verb, Adjective, and Adverb.


JWNL.initialize(new FileInputStream("PATH/TO/THE/properties.xml")); 
final Dictionary dictionary = Dictionary.getInstance();


In the code below, we query WordNet for the word “blue” as a noun and display the indexed words for blue along with a small definition that is part of the Synset.


IndexWord indexWord = dictionary.getIndexWord(POS.NOUN, "blue");

Synset[] senses = indexWord.getSenses();

for (Synset set : senses) {
System.out.println(indexWord + ": " + set.getGloss());


}


You’re done !

 

Conclusion

The code snippets here will help you get started with WordNet.  By further editing the configuration file and diving deeper into WordNet you will find it to be a very powerful tool for NLP.  It is also widely used in word-sense disambiguation and information systems. For more information and updates check out https://wordnet.princeton.edu/

Write a comment
Cancel Reply