Measuring Search Relevance using NDCG


By: Vijay Mhaskar | June 28, 2015

Normalized Discounted Cumulative Gain (NDCG) is popular method for measuring the quality of a set of search results. It asserts the following:

  1. Very relevant results are more useful than somewhat relevant results which are more useful than irrelevant results (cumulative gain).
  2. Relevant results are more useful when they appear earlier in the set of results (discounting).
  3. The result of the ranking should be irrelevant of the query preformed (normalization).

Cumulative Gain (CG) is the predecessor of DCG and does not include the position of a result in the consideration of the usefulness of a result set. In this way, it is the sum of the graded relevance values of all results in a search result list. Suppose you were presented with a set of search results for a query and asked to rank each result:
0 => Not relevant 1 => Near relevant 2 => Relevant.

If we sum the values for a page of results we will have a measure of the cumulative gain (CG).

  CG_TDG

Cumulative gain, however, doesn’t reward relevant results that appear higher in the result set. To achieve the Discounted cumulative gain (DCG) we must discount results that appear lower. The premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized as the graded relevance value is reduced logarithmically proportional to the position of the result. A common method for doing this is to, effectively, divide by the natural log of the position:

DCG_TDG

The final stage of the NDCG is normalization. If you calculate DCG for different queries you’ll find that some queries are just harder than others and will produce lower DCG scores than easier queries. Normalization solves this problem by scaling the results based off of the best result seen (called the ideal DCG or iDCG).

NDCG_TDG

The fact that you must determine the global iDCG before computing the NDCG for any given result makes implementation a bit tricky because you must first calculate the DCG for all results to determine the ideal value and then use it to calculate NDCG for each of the results.

Hope you found this useful.

You can find more information here : http://formerlegitimatescientist.net/2014/07/02/evolving-search-relevancy-part-4-calculating-relevancy-and-deriving-a-fitness-function/

This post has been viewed 3,961 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>