Words similarity/relatedness using WuPalmer Algorithm

Wu & Palmer – Words Similarity

The Wu & Palmer calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS (Least Common Subsumer).

The formula is score = 2 * depth (lcs) / (depth (s1) + depth (s2)).

This means that 0 < score <= 1. The score can never be zero because the depth of the LCS is never zero (the depth of the root of a taxonomy is one). The score is one if the two input concepts are the same.

Required library: ws4j java library

Input parameters: two words with their part of speech.

Returns: The return value is the relatedness score. If no path exists between the two word senses, then a negative number is returned. If an error occurs, then the error level is set to non-zero and an error string is created.


Word 1: cancer [POS – Noun]

Word 2: disease [POS – Noun]

Initialization of WordNet database

ILexicalDatabase db = new NictWordNet();
RelatednessCalculator rc = new WuPalmer(db);

Word Similarity Method

public double wordSimilarity(String word1, POS posWord1, String word2, POS posWord2) {
    double maxScore = 0 D;
    try {
        List < Concept > synsets1 = (List < Concept > ) db.getAllConcepts(word1, posWord1.name());
        List < Concept > synsets2 = (List < Concept > ) db.getAllConcepts(word2, posWord2.name());
        for (Concept synset1: synsets1) {
            for (Concept synset2: synsets2) {
                Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
                double score = relatedness.getScore();
                if (score > maxScore) {
                    maxScore = score;
        System.out.println("Similarity score of " + word1 + " & " + word2 + " : " + maxScore);
    } catch (Exception e) {
        logger.error("Exception : ", e);

Similarity score for cancer and disease is 0.88


  1. Initialize WordNet Database and WuPalmer object.
  2. Set MFS to true. It Uses Most Frequent Sense. MFS increases calculation speed up.
  3. Get the synsets for input words as per their pos.
  4. Iterate over each synsets to calculate relatedness score of synsets.
  5. Return max score for synsets.

Below is the description form http://ws4jdemo.appspot.com/

WuPalmer (cancer#n#1 , disease#n#1 ) = 0.88

T1 = HyperTrees(cancer#n#1) =

[1] * ROOT * #n #1 < entity# n #1 < abstraction# n #6 < attribute# n #2 
     < state# n #2 < condition# n #1 < physical_condition# n #1 
     < pathological_state# n #1 < ill_health# n #1 < illness# n #1 
     < growth# n #6 < tumor# n #1 < malignant_tumor# n #1 < cancer# n #1

[2] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1 < malignancy#n#1 < malignant_tumor#n#1 
    < cancer#n#1

T2 = HyperTrees(disease#n#1) =

[1] *ROOT*#n#1 < entity#n#1 < abstraction#n#6 < attribute#n#2 
    < state#n#2 < condition#n#1 < physical_condition#n#1 
    < pathological_state#n#1 < ill_health#n#1 < illness#n#1 
    < disease#n#1

Lowest Common Subsumer(s) = argmax(depth(subsumer(T1,T2))) = { subsumer(T1[2], T2[1]) } = disease#n#1 }

DepthLCS = depth (disease#n#1 ) = 11

Depth1 = min(depth( {tree in T1 | tree contains LCS } )) = 14
Depth2 = min(depth( {tree in T2 | tree contains LCS } )) = 11
Score = 2 * DepthLCS / ( Depth1 + Depth2 ) = 2 * 11 / (14 + 11) = 0.88

