Solr HyperLogLog

Added:12 Jun 2015
Author:Dikshant Shahi
Views:6703

Solr 5.2 introduces HyperLogLog, the probabilistic approach for counting distinct values.

Solr already had provision to count distinct values using unique facet function or countDistinct LocalParam in stats component. But this approach doesn’t scale well, as it needs to fit everything into memory and memory required grows linearly with the cardinality. If Solr gets many requests for counting distinct values, system may even throw OOM Exception soon.

HyperLogLog model for estimating the distinct value was introduced in paper HyperLogLog, which is an extension of LogLog algorithm. This algorithm uses Hash function based approach using limited memory but the count is an estimation of the distinct value instead of exact count.

Solr 5.2 introduces new facet function hll() and new LocalParam cardinality in stats component, to estimate the distinct values. Below is the syntax.

hll(<fieldname>) // facet function

{!cardinality=true}<fieldname> //stats component

Another tradeoff of HyperLogLog can be performance which can be bit slower than unique and countDistinct but uses significantly less memory, which can make it an ideal choice for scenrio such as finding unique visitors to site etc.

Tags: solr 5.2 features, solr find distinct values, solr HyperLogLog

Solr HyperLogLog

Write a comment

Search

Relevant Posts

Author's Recent Posts

Categories

Solr HyperLogLog

Write a comment

Search

Subscribe Us

Relevant Posts

Author's Recent Posts

Categories

Subscribe To Our Newsletter