We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
By: Vijay Mhaskar | May 28, 2015
Simple way by which we an achieve exact matching in Solr is by using the default string type. It is exact phrase matching. string is a useful type for facet where we search the index by using the text pulled from the index itself.
Exactish phrase match
Most of the time while searching phrases we want Solr to ignore case, punctuation, whitespace or stemming etc. If someone types in a full query, but misses a bracket, in this case it should assume they want that particular item.
Solr’s default phrase matching doesn’t differentiate between a phrase that matches the whole target string and only part of that target string. For this, we’ll need a decent text fieldtype and a way to “anchor” the search to both ends of the target string.
Here we will create a text type that will only phrase match if the query string exactishly-matches the whole field. We’ll phrase-search on this field and boost it way up.
<fieldtype name="text_lr" class="solr.TextField" positionIncrementGap="1000"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="^(.*)$" replacement="AAAA $1 ZZZZ" /> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> </analyzer> </fieldtype>
Let’s index “Computer science in USA” in a normal text field. A normal solr phrase query q=”Science in USA” will match on that value, because the query phrase is fully contained in the indexed phrase. But what happens if we index into a text_lr field?
Indexing “Computer science in USA” becomes aaaa Computer science in USA zzzz .
Search terms “Science in USA” becomes aaaa Science in USA zzzz . Then phrase searching will compare the two transformed values using normal Solr rules, and will find that they are not matching.
Things to remember
Here any non-phrase query will match every field that uses this fieldtype so use anchored fieldtypes for phrase queries only when you want exactish matches.
You can use other string instead of AAAA and ZZZZ which will not be part of your data. and by adding only one of ‘AAAA’ or ‘ZZZZ’, we can have left-anchored and right-anchored searches as well.