Solr Optimistic Concurrency Unlocked!


By: Dikshant Shahi | April 30, 2015

If you have multiple clients updating documents, it’s really critical to ensure that newer version of the document is never overwritten by the older version. To address this problem, what you need is concurrency control, which is the process of managing simultaneous update of documents.

There are two approach to handle the concurrency problem, Pessimistic and Optimistic. As the name speaks, Pessimistic approach is very pessimistic. It believes that the problem can be quite frequent and hence locks the document during transaction and any subsequent request until the transaction completes need to wait or is declined. If your document are transactional, the RDBMS is the way to go.

Pessimistic locking has inherent disadvantages, overheads and is time-consuming. So what Solr and perhaps other NoSql takes is the optimistic approach because they believe that conflicts can occur but hope it to be very rare. Hence, they don’t put the lock. Instead they record all the update operations and it they find that two users are trying to update the same document simultaneously, then one of the request is discarded and that user gets error message.

Any read request for the document doesn’t bother about concurrency and get the relatively latest document i.e., can be a little out-of-date at times.

Optimistic concurrency generally happens in 3 phase of READ-VALIDATE-WRITE. Let’s see how it’s implemented in Solr.

Solr implements Optimistic Concurrency using field _version_, which is added in each document and is by default provided in schema.xml. Remember, field names starting and ending with underscore are reserved in Solr and can have special meaning. So never try to create a field with name like _version_ for some other purpose.

To use optimistic concurrency,  you need to provide additional field _version_ along with your request for updating or removing a document. A sample request can be as follow:

$ curl http://localhost:8983/solr/update -H ‘Content-type:application/json’ -d ‘
<add>
<doc>
<field name=”id”>1234</field>
<field name=”article”>Searching Solr</field>
<field name=”abstract”>This contains the abstract</field>
<field name=”_version_”>12345678776878</field>
</doc>
</add>’

You can also provide the version information as request parameter instead of field name:

http://localhost:8983/solr/update?_version_=12345678776878

Once Solr receives the update request along with _version_ information, it will read exising document with unique key and match the number in  _version_ field with the  _version_ number in the request. The validation will follow the following rules:

_version_ > 1: Both versions should match exactly
_version_ = 1: Document must exist
_version_ < 0: Document must not exist
_version_ = 0: Overwrite

If the validation is successful, document will be indexed with an updated _version_  greater than the previous one. If the  validation fails, Solr will respond with version conflict errorcode 409.

To use this feature, you always need to provide _version_ information along with the request. If that is not provided, the existing document will be overwritten unless it’s an Atomic Update request.

 

 

 

 

Dikshant Shahi (8 Posts)

Dikshant works as Solution Architect at The Digital Group. He takes interest in Semantic Search, Information Retrieval, Natural Language Processing and Machine Learning. He is the author of book "Apache Solr: A Practical Approach to Enterprise Search".


This post has been viewed 3,794 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>