We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
In this article we will see how solr Terms Component can be used for building Auto-suggest feature and Browse index feature.
The Terms Component returns information about indexed terms in a field and the number of documents that match each term. Terms component, directly uses Lucene’s TermEnum to iterate over the term dictionary, so retrieval of terms document frequency, with Terms Component is much faster than iterating over each document in the index, finding distinct terms in the field and calculating their document frequency.
Configuring TermsComponent in solrconfig.xml:
Step 1: Define the Terms Component
<searchComponent name="terms" class="solr.TermsComponent"/>
Step 2: Use Terms Component in a Request Handler
<requestHandler name="/terms" class="solr.SearchHandler" > <lst name="defaults"> <bool name="terms">true</bool> <bool name="distrib">false</bool> </lst> <arr name="components"> <str>terms</str> </arr> </requestHandler>
Example: To get top 10 Terms in the field, ordered by their document frequency.
This query requests first ten terms, ordered by document count, in the name field
<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> </lst> <lst name="terms"> <lst name="name"> <int name="one">25</int> <int name="184">13</int> <int name="1gb">13</int> <int name="3200">12</int> <int name="400">10</int> <int name="ddr">7</int> <int name="gb">5</int> <int name="ipod">4</int> <int name="memory">3</int> <int name="pc">1</int> </lst> </lst> </response>
Application of Terms Component:
The Terms Component can be useful for building any feature that operates at the term level instead of the search or document level.
- Auto-suggest feature:
To build Auto-Suggest feature for your own search application, simply submit a query specifying whatever characters the user has typed so far, as a prefix. For example, if the user has typed “ele”, the search engine’s interface would submit the following query:
<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> </lst> <lst name="terms"> <lst name="name"> <int name="electronics">24</int> <int name="electricity">2</int> <int name="election ">2</int> </lst> </lst> </response>
2. Browse index feature:
Some search applications provides list of fields to user, on which he can perform search.
for example : title:heart, name:heart etc
Now user knows, what all fields in the index are available for search and this available number of fields for search, can be large sometimes more than 20.
If user wants to search term “heart” on limited number of fields, then for that he should know what all fields in the index contains “heart”.
To help user, to find fields containing, “heart”, the site can provide Browse index feature to user. User will submit “heart” and Browse index feature should list down all fields in the index containing “heart” along with number of document that have “heart” in that field.
For example for “heart”, Browse index feature should return information, as shown below.
Title:heart(doc freq 800), description:heart (doc freq 16000), category:heart(doc freq 450)
Once user knows which fields contain “heart” and its document frequency, then he can refine his search, by limiting his search for “heart” in restricted fields.
How to implement Browse index feature:
When user submits “heart” to Browse index feature, then search application should submit following query to solr:
http://localhost:8983/solr/collection1/terms? terms.regex=heart.*& terms.fl=[field1]& terms.fl=[field2]…
Thus while constructing above query, Browse index feature should append, &terms.fl for all the fields in the index, that are available for search.