We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
By: Dikshant Shahi | April 30, 2015
If you have multiple clients updating documents, it’s really critical to ensure that newer version of the document is never overwritten by the older version. To address this problem, what you need is concurrency control, which is the process of managing simultaneous update of documents.
There are two approach to handle the concurrency problem, Pessimistic and Optimistic. As the name speaks, Pessimistic approach is very pessimistic. It believes that the problem can be quite frequent and hence locks the document during transaction and any subsequent request until the transaction completes need to wait or is declined. If your document are transactional, the RDBMS is the way to go.
Pessimistic locking has inherent disadvantages, overheads and is time-consuming. So what Solr and perhaps other NoSql takes is the optimistic approach because they believe that conflicts can occur but hope it to be very rare. Hence, they don’t put the lock. Instead they record all the update operations and it they find that two users are trying to update the same document simultaneously, then one of the request is discarded and that user gets error message.
Any read request for the document doesn’t bother about concurrency and get the relatively latest document i.e., can be a little out-of-date at times.
Optimistic concurrency generally happens in 3 phase of READ-VALIDATE-WRITE. Let’s see how it’s implemented in Solr.
Solr implements Optimistic Concurrency using field _version_, which is added in each document and is by default provided in schema.xml. Remember, field names starting and ending with underscore are reserved in Solr and can have special meaning. So never try to create a field with name like _version_ for some other purpose.
To use optimistic concurrency, you need to provide additional field _version_ along with your request for updating or removing a document. A sample request can be as follow:
$ curl http://localhost:8983/solr/update -H ‘Content-type:application/json’ -d ‘
<field name=”article”>Searching Solr</field>
<field name=”abstract”>This contains the abstract</field>
You can also provide the version information as request parameter instead of field name:
Once Solr receives the update request along with _version_ information, it will read exising document with unique key and match the number in _version_ field with the _version_ number in the request. The validation will follow the following rules:
_version_ > 1: Both versions should match exactly
_version_ = 1: Document must exist
_version_ < 0: Document must not exist
_version_ = 0: Overwrite
If the validation is successful, document will be indexed with an updated _version_ greater than the previous one. If the validation fails, Solr will respond with version conflict errorcode 409.
To use this feature, you always need to provide _version_ information along with the request. If that is not provided, the existing document will be overwritten unless it’s an Atomic Update request.