We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
By: Srinivasa Sarma | February 9, 2016
This article explains how to implement SOLR “document level security” using Manifold Connector Framework. ManifoldCF is an open source framework for pulling content out of a repository and sending it on to targets such as SOLR via a plug-in style and connector-based architecture. ManifoldCF includes connectors for numerous commercial and open source data sources, including Documentum, SharePoint, JDBC, and RSS. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.
The ManifoldCF security model is based loosely on the standard authorization concepts and hierarchies found in Microsoft’s Active Directory. ManifoldCF defines a concept of an access token. ManifoldCF security model, it is the job of an authority to provide a list of access tokens for a given searching user. Multiple authorities cooperate in that each one can add to the list of access tokens describing a given user’s security.
Below sections is about how to set up ManifoldCF, ManifoldCF crawler usage and to configure ManifoldCF plugin with SOLR.
- Setup ManifoldCF
- Configuration of ManifoldCF with SOLR
Setup ManifoldCF: -
This section explains how to setup ManifoldCF.
Download ManifoldCF binary distribution from https://manifoldcf.apache.org/en_US/download.html and unzip it
Open command prompt and use start.bat to start ManifoldCF as shown below
This will start ManifoldCF- required services running and desired connection types properly registered
- ManifoldCF user interface can access using crawler.
When enter the Framework user interface the first time, you will first be asked to log in
Enter the login user name and password for your system. By default, the user name is “admin” and the password is “admin”. The screen should look something like this.
- Create an output connection by clicking the “List Output Connections”
- Enter Name, description and select Type tab to select SOLR output connection and continue
- Select single server from Solr type, since we are setting up in single box.
- Select Server tab to configure SOLR
- Select schema tab to enter primary key information of existing Solr collection and save.
- Create an authority group by clicking the “List Authority Groups and “Add a new authority group”
User Mapping Connections
- Create a mapping connection by clicking the “List User Mapping Connections” and “Add a new connection”
- Select type as regular expression mapper and save. If everything is good then crawler displays “connection working”
- Create an authority connection by clicking the “List Authority Connections” link
- Create a new connection by clicking “Add new connection”
- Enter name and description and select type to select Authority type as follows.
- Select authority group which create before and save it.
Configuration of ManifoldCF plugin with SOLR: -
This section guides step by step process to configure ManifoldCF plugin with Solr
- Copy from $:\apache-manifoldcf-2.3\plugins\solr\solr-X.X\apache-manifoldcf-solr-X.X-plugin-2.2.JAR to Solr core lib directory
There are two ways to hook up security to Solr in this package. The first is using a Query Parser plugin. The second is using a Search Component. In both cases, the first step is to have ManifoldCF installed and running.
- Then, you will need to add fields to your Solr schema.xml file that can be used to contain document authorization information. There is a need to be six of these fields, ‘allow’ and ‘deny’ field for documents, parents, and shares. For example
- The default value of “__nosecurity__” is required by this plugin, so do not forget to include it.
Using the Query Parser Plugin
- To set up the query parser plugin, modify your solrconfig.xml to add the query parser:
MCF Authority Service:
Access Token: ManifoldCF defines a concept of an access token. An access token, to ManifoldCF, is a string which is meaningful only to a specific connector or connectors. This string describes the ability of a user to view (or not view) some set of documents. To see access token use following URL.
Indexing data to SOLR:
- Start Solr instance and using following xml data, post xml to Solr. In this example see highlighted text to provide user token to access document
Query data using SOLR Admin:
- Query data without providing user token then Solr will return no results which are having user token as “__nosecurity” (default token). In above scenario Solr will not return results above document.
- Query with following user tokens then Solr will all the results along with above results.