Grouping Results with Solr


By: Dattatraya Patil | July 10, 2015

 

 

Grouping Results:

Imagine a situation where your data set is divided into different categories, subcategories,price ranges, and things like that. What if you would like to n ot only get information about counts in such a group (with the use of faceting), but would also like to show the most relevant documents in each of the groups.

We can use 3 types of group queries:

1. Using field values to group results

q=*:*&group=true&group.field=category

2. Using queries to group results(

q=*:*&group=true&group.query=price:[20.0+TO+50.0]&group.query=price:[1.0+TO+19.99]

3. Using function queries to group results

q=*:*&group=true&group.func=geodist(geo,0.0,0.0)

Any number of group commands (group.field, group.func, group.query) may be specified in a single request.

Suppose we add following three documents in the index:

<add>
   <doc>
      <field name="id">1</field>
      <field name="name">Solr cookbook</field>
      <field name="category">it</field>
      <field name="price">39.99</field>
   </doc>
   <doc>
      <field name="id">2</field>
      <field name="name">Mechanics cookbook</field>
      <field name="category">mechanics</field>
      <field name="price">19.99</field>
   </doc>
   <doc>
      <field name="id">3</field>
      <field name="name">ElasticSearch book</field>
      <field name="category">it</field>
      <field name="price">49.99</field>
   </doc>
</add>

1. Using field values to group results:

To get our data divided into groups on the basis of their category:

http://localhost:8983/solr/select?q=*:*&group=true&group.field=category&facet=true&facet.field= category

The results returned by the preceding query are as follows:

<response>
  <lst name="grouped">
    <lst name="category">
      <int name="matches">3</int>
      <arr name="groups">
        <lst>
          <str name="groupValue">it</str>
          <result name="doclist" numFound="2" start="0">
            <doc>
              <str name="id">1</str>
              <str name="name">Solr cookbook</str>
              <str name="category">it</str>
              <float name="price">39.99</float>
            </doc>
          </result>
        </lst>
        <lst>
          <str name="groupValue">mechanics</str>
          <result name="doclist" numFound="1" start="0">
            <doc>
              <str name="id">2</str>
              <str name="name">Mechanics cookbook</str>
              <str name="category">mechanics</str>
              <float name="price">19.99</float>
            </doc>
          </result>
          <lst name="facet_counts">
            <lst name="facet_queries"/>
            <lst name="facet_fields">
              <lst name=" category ">
                <int name="it">2</int>
                <int name="mechanics ">1</int>
              </lst></lst>
              <lst name="facet_dates"/>
              <lst name="facet_ranges"/>
            </lst>
          </response>

The result contains single  topmost document from each group.If if you want to get more than one document from each group, then  append &group.limit in the above query as shown below.

http://localhost:8983/solr/select?q=*:*&group=true&group.field=category&group.limit=10.

This query returns maximum 10 documents from each group.

2. Using queries to group results:

Sometimes grouping results on the basis of field values is not enough. For example, imagine that we would like to group documents in price brackets, that is, we would like to show the most relevant document for documents with price range of 1.0 to 19.99, a document for documents with price range of 20.00 to 50.0, and so on.

http://localhost:8983/solr/select?q=*:*&group=true&group.query=price:[20.0+TO+50.0]&group.query=price:[1.0+TO+19.99]

The results of the preceding query looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<lst name="grouped">
  <lst name="price:[20.0 TO 50.0]">
    <int name="matches">3</int>
    <result name="doclist" numFound="2" start="0">
      <doc>
        <str name="id">1</str>
        <str name="name">Solr cookbook</str>
        <str name="category">it</str>
        <float name="price">39.99</float>
      </doc>
    </result>
  </lst>
  <lst name="price:[1.0 TO 19.99]">
    <int name="matches">3</int>
    <result name="doclist" numFound="1" start="0">
      <doc>
        <str name="id">2</str>
        <str name="name">Mechanics cookbook</str>
        <str name="category">mechanics</str>
        <float name="price">19.99</float>
      </doc>
    </result>
  </lst></lst>
</response>

 3. Using function queries to group results

Imagine that you would like to group results not by using queries or field contents, but instead you would like to use a value returned by a function query.

To group documents on the basis of their distance from a given point.

http://localhost:8983/solr/select?q=*:*&group=true&group.func=geodist(geo,0.0,0.0)

The results of the preceding query looks as follows:

<lst name="grouped">
  <lst name="geodist(geo,0.0,0.0)">
    <int name="matches">3</int>
    <arr name="groups">
      <lst>
        <double name="groupValue">1584.126028923632</double>
        <result name="doclist" numFound="1" start="0">
          <doc>
            <str name="id">1</str>
            <str name="name">Company one</str>
            <str name="geo">10.1,10.1</str>
          </doc>
        </result>
      </lst>
      <lst>
        <double name="groupValue">1740.0195023531824</double>
        <result name="doclist" numFound="2" start="0">
          <doc>
            <str name="id">2</str>
            <str name="name">Company two</str>
            <str name="geo">11.1,11.1</str>
          </doc>
        </result>
      </lst>
      <lst>
        <double name="groupValue">1911.187477467305</double>
        <result name="doclist" numFound="1" start="0">
          <doc>
            <str name="id">4</str>
            <str name="name">Company four</str>
            <str name="geo">12.2,12.2</str>
          </doc>
        </result>
      </lst>
    </arr>
  </lst></lst>
</response>

 

Request Parameters for Grouping Results

Any number of these request parameters can be included in a single request:

Parameter Type Description
group Boolean If true, query results will be grouped.
group.field string The name of the field by which to group results. The field must be single-valued, and either be indexed
group.func query Group based on the unique values of a function query.NOTE: This option does not work with distributed searches.
group.query query Return a single group of documents that match the given query.
rows integer The number of groups to return. The default value is 10.
start integer Specifies an initial offset for the list of groups.
group.limit integer Specifies the number of results to return for each group. The default value is 1.
group.offset integer Specifies an initial offset for the document list of each group.
sort sortspec Specifies how Solr sorts the groups relative to each other. For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity document in each group. The default value is score desc.
group.sort sortspec Specifies how Solr sorts documents within a single group. The default value is score desc.
group.format grouped/simple If this parameter is set to simple, the grouped documents are presented in a single flat list, and the start and rows parameters affect the numbers of documents instead of groups.
group.ngroups Boolean If true, Solr includes the number of groups that have matched the query in the results. The default value is false.
group.truncate Boolean If true, facet counts are based on the most relevant document of each group matching the query. The default value is false.

 

Distributed Result Grouping Caveats

Grouping is supported for distributed searches, with some caveats:

  • Currently group.func is not supported in any distributed searches.
  • group.ngroups require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned.
This post has been viewed 6,171 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>