Introduction to Solr Analytics


By: Susheel Kumar | August 7, 2015

Introduction:

The Analytics topic has been so popular these days, that its capabilities are being expected from every other Big Data / Search platforms such as Hadoop, Spark, Cassandra, Elastic, Solr. Solr a popular search platform, provides various analytics capabilities and allows to extend / customize  it further.

Before we get into how to perform basic analytical operations in Solr, let’s talk about some of the use cases where you would like to apply the Analytics.

  1. Product Analytics – Imagine you are a online retailer shop which sells various products and
    provide user the search tool to help them find what they are looking for. For all the browse actions/event which user is performing you may want to capture that somewhere. You may think of storing them in log files/database etc. but it may grow so fast (depending on how many visitors) that you need a scalable platform. One option is to store them in Solr in a separate core and perform various adhoc analytics like You may want to analyze what product / product categories are being searched more (group by), what price range user searches for more (range queries), what period of time during the day volume is high(range queries) If your site allows registered users, then which users are performing more searches and what kind of product category they search for etc. This all can be done by utilizing Solr without putting any additional data platform in place and overall reducing total cost of ownership
  2. Run time Analytics – Consider a use cases where you would like to provide your users some aggregated information during the search. Taking an example, user search for a product “computer glasses” and you provide various other aggregated info on top of search results like the lowest & highest price or in a use case where you are storing stocks prices, you may want to provide various other aggregated info like total stock value, peek time in a day/week/month etc.
  3. Geo Spatial Analytics – There can be many use cases around Geo Spatial. For a job Search, analyzing the skilled population by region/city/stats/country or for a Power/Electricity provider finding the closer power grids from a particular location (geo codes) etc. For Product searches you may want to capture the incoming IP addresses into geo codes and then later try to find out your customer/user base region wise.

These few use cases are like “tip of the iceberg” and as you work on your domain you will encounter similar use cases where you can apply analytics to make sense of your data.

Mathematical Functions  

The first step is to get aware with Solr Function queries & various Math functions.  There are various mathematical functions which Solr provides (mostly all Java functions) like min, max, sum, product, log etc. See https://cwiki.apache.org/confluence/display/solr/Function+Queries for more details.   Taking an example, lets assume you are being asked to return an additional field “total price” for each document/record which is a product of qty & price.  You may append this to fl parameter like below to get total price for each document.

&fl=*,{!func key=total_price}mul(price,qty)

Similarly these functions can be applied on q parameter as well like in below example it would return all docs with total_price higher than 500

q={!frange l=500 }mul(price,qty)

Aggregate Functions 

An aggregate function allows you to perform calculation on a set of values to return a single scalar value. You may remember to use aggregate functions with the GROUP BY and HAVING clauses of the SELECT statement in SQL.  Similar operations available  in Solr but in a different fashion. Lets take a look at them one by one

Distinct or Unique Functions or Group By

One approach to get distinct / unique values from set of documents is by using facets or json.facets.  E.g. get distinct records for manufacturer with their counts

 &facet=true&facet.field=manu_id_s  

OR

&json.facet={categories : {type:  terms, field : manu_id_s}}

Note: json.facets is still in experimental phase and may undergo changes but there is lot out there https://cwiki.apache.org/confluence/display/solr/Faceted+Search

The other approach could be to use field collapsing / result grouping but limited to single valued fields

&group=true&group.field=manu_id_s

Aggregation – Sum, Average, Min, Max 

In various situations you may want to perform aggregation on set of documents in which case you will have to look on Stats component of Solr.

For e.g. you may want to sum up price for all the documents. You would add below parameters which will return various default statistics like sum, max, min, mean(avg) etc.

&stats=true&stats.field=price

If you want to only have sum returned, you can add

&stats.field={!sum=true}price

You will be able to  exclude or tag filters etc. as usual. e.g.

q=*:*&fq={!tag=filterStock}inStock:true&wt=json&indent=true&stats=true&stats.field={!ex=filterStock key=InStockPrices}price

This shall give basic understanding of various analytical capabilities within Solr and in the next blog I am going to cover advance analytics like pivot facets & analytics component and how Solr cloud affects these analytics capabilities.

Feel free to contact me or leave you comments in case of any questions.

 

This post has been viewed 5,100 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>