Words similarity/relatedness using WuPalmer Algorithm

Wu & Palmer – Words Similarity

The Wu & Palmer calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS (Least Common Subsumer).


The formula is score = 2 * depth (lcs) / (depth (s1) + depth (s2)).


This means that 0 < score <= 1. The score can never be zero because the depth of the LCS is never zero (the depth of the root of a taxonomy is one). The score is one if the two input concepts are the same.


Required library: ws4j java library

Input parameters: two words with their part of speech.

Returns: The return value is the relatedness score. If no path exists between the two word senses, then a negative number is returned. If an error occurs, then the error level is set to non-zero and an error string is created.


Example:

Word 1: cancer [POS – Noun]

Word 2: disease [POS – Noun]

Initialization of WordNet database

ILexicalDatabase db = new NictWordNet();
RelatednessCalculator rc = new WuPalmer(db);


Word Similarity Method

public double wordSimilarity(String word1, POS posWord1, String word2, POS posWord2) {
    double maxScore = 0 D;
    try {
        WS4JConfiguration.getInstance().setMFS(true);
        List < Concept > synsets1 = (List < Concept > ) db.getAllConcepts(word1, posWord1.name());
        List < Concept > synsets2 = (List < Concept > ) db.getAllConcepts(word2, posWord2.name());
        for (Concept synset1: synsets1) {
            for (Concept synset2: synsets2) {
                Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
                double score = relatedness.getScore();
                if (score > maxScore) {
                    maxScore = score;
                }
            }
        }
        System.out.println("Similarity score of " + word1 + " & " + word2 + " : " + maxScore);
    } catch (Exception e) {
        logger.error("Exception : ", e);
    }


Similarity score for cancer and disease is 0.88

Description:

  1. Initialize WordNet Database and WuPalmer object.
  2. Set MFS to true. It Uses Most Frequent Sense. MFS increases calculation speed up.
  3. Get the synsets for input words as per their pos.
  4. Iterate over each synsets to calculate relatedness score of synsets.
  5. Return max score for synsets.


Below is the description form http://ws4jdemo.appspot.com/

WuPalmer (cancer#n#1 , disease#n#1 ) = 0.88

T1 = HyperTrees(cancer#n#1) =

[1] * ROOT * #n #1 < entity# n #1 < abstraction# n #6 < attribute# n #2 
     < state# n #2 < condition# n #1 < physical_condition# n #1 
     < pathological_state# n #1 < ill_health# n #1 < illness# n #1 
     < growth# n #6 < tumor# n #1 < malignant_tumor# n #1 < cancer# n #1

[2] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1 < malignancy#n#1 < malignant_tumor#n#1 
    < cancer#n#1


T2 = HyperTrees(disease#n#1) =

[1] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1


Lowest Common Subsumer(s) = argmax(depth(subsumer(T1,T2))) = { subsumer(T1[2], T2[1]) } = disease#n#1 }

DepthLCS = depth (disease#n#1 ) = 11

Depth1 = min(depth( {tree in T1 | tree contains LCS } )) = 14
Depth2 = min(depth( {tree in T2 | tree contains LCS } )) = 11
Score = 2 * DepthLCS / ( Depth1 + Depth2 ) = 2 * 11 / (14 + 11) = 0.88

Write a comment
Cancel Reply
  • Jothi G April 7, 2016, 5:43 am
    Sir, I am new in Java. This is main our class, public static void main(String[] args) { wordSimilarity("cancer", POS.n, "disease", POS.n); } But it is not working. The error Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - non-static method wordSimilarity(java.lang.String,edu.cmu.lti.jawjaw.pobj.POS,java.lang.String,edu.cmu.lti.jawjaw.pobj.POS) cannot be referenced from a static context at similarity1.main(similarity1.java:28) Java Result: 1 What is correct way calling this function. Thank you.
    reply
  • Sagar Gole October 29, 2015, 10:11 am
    You can create your own class and add the above code in it. First you have to initialize the <strong>ILexicalDatabase</strong> and <strong>RelatednessCalculator</strong> objects and then call the wordSimilarity method in your main method. Like <strong>wordSimilarity("cancer", POS.n, "disease", POS.n);</strong> Thanks, SJGole
    reply
  • jothi October 29, 2015, 7:04 am
    Sir, Where is the main class in your code. Give your full code . Thank you.
    reply