Understanding PhraseQuery and Slop in Solr
PhraseQuery in Lucene matches documents containing a particular sequence of terms. PhraseQuery uses positional information of the term that is stored in an index.
The number of other words permitted between words in query phrase is called “Slop“. We can use the tilde, “~”, symbol at the end of our Phrase for this. The lesser the distance between two terms the higher the score will be. A sloppy phrase query specifies a maximum “slop”, or the number of positions tokens need to be moved to get a match. The slop is zero by default, requiring exact matches.
See below examples for Phrase Search in Solr,
1. This example for the standard request handler will find all documents where “digital” occurs within 100 words of “group”.
q=text:”digital group”~100
2. The dismax handler can easily create sloppy phrase queries with the pf (phrase fields) and ps (phrase slop) parameters. ps (Phrase Slop) affects boosting, if you play with ps value, numFound and result set do not change. But the order of the result set change. More exact matches are scored higher than sloppier matches, thus search results are sorted by exactness.
q=digital group&pf=text&ps=100
3. The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the qs(query slop) parameter can be used to add slop to any explicit phrase queries.
q=”digital group”&qs=100
4. In solrconfig.xml we can configure it in request handler like below,
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="qf">field</str>
<str name="qs">10</str>
<str name="pf">field</str>
<str name="ps">10</str>
<str name="q.alt">*:*</str>
</lst>
</requestHandler>
By setting the qf (Query Fields), qs (Query Phrase Slop), pf (Phrase Fields), ps (Phrase Slop), pf2 (Phrase bigram fields), ps2 (Phrase bigram slop), pf3 (Phrase trigram fields), ps3 (Phrase trigram slop) parameter you can control which fields would be searched upon. Usually, the words are search individually on all the fields and the scored as per the proximity.
Order of the words also matter. So “word1 word2? is going to be different than “word2 word1? because a different number of transpositions are allowed.