Words similarity/relatedness using WuPalmer Algorithm


By: Sagar Gole | June 10, 2015

Wu & Palmer – Words Similarity

The Wu & Palmer calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS (Least Common Subsumer).

The formula is score = 2 * depth (lcs) / (depth (s1) + depth (s2)).

This means that 0 < score <= 1. The score can never be zero because the depth of the LCS is never zero (the depth of the root of a taxonomy is one). The score is one if the two input concepts are the same.

Required library: ws4j java library

Input parameters: two words with their part of speech.

Returns: The return value is the relatedness score. If no path exists between the two word senses, then a negative number is returned. If an error occurs, then the error level is set to non-zero and an error string is created.

Example:

    Word 1: cancer [POS – Noun]

    Word 2: disease [POS – Noun]

Initialization of WordNet database

    ILexicalDatabase db = new NictWordNet();
    RelatednessCalculator rc = new WuPalmer(db);

Word Similarity Method

public double wordSimilarity(String word1, POS posWord1, String word2, POS posWord2) {
    double maxScore = 0D;
	try {
		WS4JConfiguration.getInstance().setMFS(true);
		List<Concept> synsets1 = (List<Concept>) db.getAllConcepts(word1, posWord1.name());
		List<Concept> synsets2 = (List<Concept>) db.getAllConcepts(word2, posWord2.name());
		for (Concept synset1 : synsets1) {
			for (Concept synset2 : synsets2) {
				Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
				double score = relatedness.getScore();
				if (score > maxScore) {
					maxScore = score;
				}
			}
		}
		System.out.println("Similarity score of " + word1 + " & " + word2 + " : " + maxScore);
		} catch (Exception e) {
			logger.error("Exception : ", e);
		}
		return maxScore;
	}

Similarity score for cancer and disease is 0.88

Description:

  1. Initialize WordNet Database and WuPalmer object.
  2. Set MFS to true. It Uses Most Frequent Sense. MFS increases calculation speed up.
  3. Get the synsets for input words as per their pos.
  4. Iterate over each synsets to calculate relatedness score of synsets.
  5. Return max score for synsets.

Below is the description form http://ws4jdemo.appspot.com/

WuPalmer (cancer#n#1 , disease#n#1 ) = 0.88

T1 = HyperTrees(cancer#n#1) =

[1] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
     < state#n#2 < condition#n#1 < physical_condition#n#1 
     < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
     < growth#n#6 < tumor#n#1 < malignant_tumor#n#1 < cancer#n#1
[2] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1 < malignancy#n#1 < malignant_tumor#n#1 
    < cancer#n#1

T2 = HyperTrees(disease#n#1) =

[1] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1

Lowest Common Subsumer(s) = argmax(depth(subsumer(T1,T2))) = { subsumer(T1[2], T2[1]) } = disease#n#1 }

DepthLCS = depth (disease#n#1 ) = 11

Depth1 = min(depth( {tree in T1 | tree contains LCS } )) = 14
Depth2 = min(depth( {tree in T2 | tree contains LCS } )) = 11
Score = 2 * DepthLCS / ( Depth1 + Depth2 ) = 2 * 11 / (14 + 11) = 0.88

This post has been viewed 6,260 times

3 thoughts on “Words similarity/relatedness using WuPalmer Algorithm

    1. Sagar Gole Post author

      You can create your own class and add the above code in it.

      First you have to initialize the ILexicalDatabase and RelatednessCalculator objects and then call the wordSimilarity method in your main method.

      Like wordSimilarity(“cancer”, POS.n, “disease”, POS.n);

      Thanks,
      SJGole

      Reply
  1. Jothi G

    Sir,

    I am new in Java.

    This is main our class,

    public static void main(String[] args) {
    wordSimilarity(“cancer”, POS.n, “disease”, POS.n);

    }

    But it is not working. The error

    Exception in thread “main” java.lang.RuntimeException: Uncompilable source code – non-static method wordSimilarity(java.lang.String,edu.cmu.lti.jawjaw.pobj.POS,java.lang.String,edu.cmu.lti.jawjaw.pobj.POS) cannot be referenced from a static context at similarity1.main(similarity1.java:28)
    Java Result: 1

    What is correct way calling this function.

    Thank you.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>