Using Solr’s ComplexPhraseQueryParser


By: Vijay Mhaskar | July 10, 2015

Introduction :

ComplexPhraseQuery allows complex phrase query syntax e.g “canc* treat* “. It Performs multiple passes over Query text to parse any nested logic in PhraseQueries.

  1. First pass takes any PhraseQuery content between quotes and stores for subsequent pass. All other query content is parsed as normal.
  2. Second pass parses any stored PhraseQuery content, checking all embedded clauses are referring to the same field and therefore can be rewritten as Span queries. All PhraseQuery clauses are expressed as ComplexPhraseQuery objects

The ComplexPhraseQParser provides support for wildcards, ORs, etc., inside phrase queries using Lucene’s ComplexPhraseQueryParser . Under the covers, this query parser makes use of the Span group of queries, e.g., spanNear, spanOr, etc.,

Examples :

  1. {!complexphrase inOrder=true}article:”ca?c* tre*”

This query will return documents which contain following phrases

<doc>
<str name="article">Breast cancer treatments.</str>
</doc>
<doc>
<str name="article">Squamous-cell carcinoma treatment.</str>
</doc>
<doc>
<str name="article">On calculating treatment satisfaction.</str>
</doc>

2.  We can specify the proximity of the clauses. For example, the following will match a article which contains “hypoxia inducible factor”:

q={!complexphrase}name:”hypox* factor”~1

<doc>
<str name="article">Purification and characterization
of hypoxia-inducible factor 1.</str></doc>
<doc>
<str name="article">Expression of hypoxia-inducible
factor 1: mechanisms and consequences.</str></doc>
<doc>
<str name="article">Discovery of Indenopyrazoles as a
New Class of Hypoxia Inducible Factor (HIF)-1 Inhibitors.</str></doc>
<doc>

Limitations:

Performance will depend on number of unique terms in query. For instance, searching for “a*” will form a large OR clauses for all of the terms in your index for the indicated field that start with the single letter ‘a’. Allowing very short prefixes may result in to many low-quality documents being returned. You may need to increase MaxBooleanClauses in solrconfig.xml as a result of the term expansion.

This post has been viewed 4,831 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>