Using Solr's ComplexPhraseQueryParser
ComplexPhraseQuery allows complex phrase query syntax e.g “canc* treat* “. It Performs multiple passes over Query text to parse any nested logic in PhraseQueries.
- The first pass takes any PhraseQuery content between quotes and stores for the subsequent pass. All other query content is parsed as normal.
- The second pass parses any stored PhraseQuery content, checking all embedded clauses are referring to the same field and therefore can be rewritten as Span queries. All PhraseQuery clauses are expressed as ComplexPhraseQuery objects.
Examples :
1. {!complexphrase inOrder=true}article:”ca?c* tre*”
This query will return documents which contain following phrases
<doc>
<str name="article">Breast cancer treatments.</str>
</doc>
<doc>
<str name="article">Squamous-cell carcinoma treatment.</str>
</doc>
<doc>
<str name="article">On calculating treatment satisfaction.</str>
</doc>
2. We can specify the proximity of the clauses. For example, the following will match an article which contains “hypoxia inducible factor”:
q={!complexphrase}name:”hypox* factor”~1
<doc>
<str name="article">Purification and characterization
of hypoxia-inducible factor 1.</str></doc>
<doc>
<str name="article">Expression of hypoxia-inducible
factor 1: mechanisms and consequences.</str></doc>
<doc>
<str name="article">Discovery of Indenopyrazoles as a
New Class of Hypoxia Inducible Factor (HIF)-1 Inhibitors.</str></doc>
<doc>
Limitations:
The performance will depend on the number of unique terms in the query. For instance, searching for “a*” will form a large OR clauses for all of the terms in your index for the indicated field that start with the single letter ‘a’. Allowing very short prefixes may result in too many low-quality documents being returned. You may need to increase MaxBooleanClauses in solrconfig.xml as a result of the term expansion.