Solr’s mm parameter – Explanation of Min Number Should Match


By: Vijay Mhaskar | May 26, 2015

Introduction 

This article explains the format used for specifying the “Min Number Should Match” criteria of the BooleanQuery objects built by the DisMaxRequestHandler.  Using this it is possible to specify a percentage of query words (or blocks) that should appear in a document.

There are 3 types of “clauses” that Solr (Lucene) knows about: mandatory, prohibited, and ‘optional’.  By default all words or phrases specified in the “q” param are treated as “optional” clauses unless they are preceeded by a “+” or a “-“. When dealing with these “optional” clauses, the “mm” option makes it possible to say that a certain minimum number of those clauses must match (mm).

Specifying this minimum number can be done in complex ways, like…..

  1. At least 2 of the optional clauses must match, regardless of how many clauses there are: “2”.
  2. At least 75% of the optional clauses must match, rounded down: “75%”.
  3. If there are less than 3 optional clauses, they all must match; if there are 3 or more, then 75% must match, rounded up: “2<-25%”.
  4. If there are less than 3 optional clauses, they all must match; for 3 to 5 clauses, one less than the number of clauses must match, for 6 or more clauses, 80% must match, rounded down: “2<-1 5<80%”
  5. Multiple conditional specifications can be separated by spaces, each one only being valid for numbers greater than the one before it. In this example: if there are 1 or 2 clauses both are required, if there are 3-9 clauses all but 25% are required, and if there are more than 9 clauses, all but three are required: “2<-25% 9<-3″

A few important notes…

  • When dealing with percentages, negative values can be used to get different behavior in edge cases. 75% and -25% mean the same thing when dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are required, but -25% means 4 are required.
  • No matter what number the calculation arrives at, a value greater than the number of optional clauses, or a value less than 1 will never be used.
  • The lower the percentage, the more permutations of input terms there are that can produce matches, and the more documents that will match. In which case Solr by definition is doing more work.

 

This post has been viewed 5,785 times

2 thoughts on “Solr’s mm parameter – Explanation of Min Number Should Match

  1. Raj

    What is the mm clause i should use if i want to match at least 2 words in a given sentence.

    Ex: Raj Kumar LLC. I should be able to get all matches where Raj Kumar is present, like Raj Kumar Inc, Raj Kumar Corp etc..

    Reply
    1. Vijay Mhaskar Post author

      You can specify mm=2, which will return all those documents which have at least two terms from your query.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>