SOLR Security with ManifoldCF

This article explains how to implement SOLR “document level security” using Manifold Connector Framework. ManifoldCF is an open source framework for pulling content out of a repository and sending it on to targets such as SOLR via a plug-in style and connector-based architecture. ManifoldCF includes connectors for numerous commercial and open source data sources, including Documentum, SharePoint, JDBC, and RSS. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.


The ManifoldCF security model is based loosely on the standard authorization concepts and hierarchies found in Microsoft’s Active Directory. ManifoldCF defines a concept of an access token. ManifoldCF security model, it is the job of an authority to provide a list of access tokens for a given searching user. Multiple authorities cooperate in that each one can add to the list of access tokens describing a given user’s security. Below sections is about how to set up ManifoldCF, ManifoldCF crawler usage and to configure ManifoldCF plugin with SOLR.


  • Setup ManifoldCF
  • Configuration of ManifoldCF with SOLR


Setup ManifoldCF: -

1

This will start ManifoldCF- required services running and desired connection types properly registered

 

Overview:

2

Enter the login username and password for your system. By default, the username is “admin” and the password is “admin”. The screen should look something like this.


3


Output Connections:

  • Create an output connection by clicking the “List Output Connections”
  • Enter Name, description and select Type tab to select SOLR output connection and continue

4

Select single server from Solr type, since we are setting up in single box.

5

Select Server tab to configure SOLR

6

Select schema tab to enter primary key information of existing Solr collection and save.

7


Authority Groups
Create an authority group by clicking the “List Authority Groups and “Add a new authority group”

8


User Mapping Connections

Create a mapping connection by clicking the “List User Mapping Connections” and “Add a new connection”

9

Select type as regular expression mapper and save. If everything is good then crawler displays “connection working”

10


Authority Connections

  • Create an authority connection by clicking the “List Authority Connections” link
  • Create a new connection by clicking “Add new connection”
  • Enter name and description and select type to select Authority type as follows.

  • 11
  • Select authority group which create before and save it.

12in with SOLR: -

This section guides step by step process to configure ManifoldCF plugin with Solr

  • Copy from $:\apache-manifoldcf-2.3\plugins\solr\solr-X.X\apache-manifoldcf-solr-X.X-plugin-2.2.JAR to Solr core lib directory

There are two ways to hook up security to Solr in this package. The first is using a Query Parser plugin. The second is using a Search Component. In both cases, the first step is to have ManifoldCF installed and running. 13

  • Then, you will need to add fields to your Solr schema.xml file that can be used to contain document authorization information. There is a need to be six of these fields, ‘allow’ and ‘deny’ field for documents, parents, and shares. For example
  • The default value of “__nosecurity__” is required by this plugin, so do not forget to include it.


Using the Query Parser Plugin

To set up the query parser plugin, modify your solrconfig.xml to add the query parser:

14 


MCF Authority Service:

Access Token: ManifoldCF defines a concept of an access token. An access token, to ManifoldCF, is a string which is meaningful only to a specific connector or connectors. This string describes the ability of a user to view (or not view) some set of documents. To see access token use following URL.

http://localhost:8345/mcf-authority-service/UserACLs?username=User1


Indexing data to SOLR:

  • Start Solr instance and using following xml data, post xml to Solr. In this example see highlighted text to provide user token to access document

 

15


Query data using SOLR Admin:

  • Query data without providing user token then Solr will return no results which are having user token as “__nosecurity” (default token). In above scenario Solr will not return results above document.
  • Query with following user tokens then Solr will all the results along with above results.

16 Courtesy:


URL for Ref

  • https://manifoldcf.apache.org/release/release-2.3/en_US/concepts.html
  • https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
Write a comment
Cancel Reply
  • Krishna prasad January 22, 2019, 10:02 am
    How can i integrate sharepoint (office 365) and solr using manifoldcf
    reply