We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
By: Susheel Kumar | August 3, 2015
Here I am going to talk about a basic SolrCloud setup on 2 separate machines (or 2 Solr nodes sitting on different machines) with 1 Zookeeper instance for development purpose.
I hardly found any article which talks about setting up 2 nodes solr instance with 1 ZK node on separate machines (can be VM), so putting this together.
The default SolrCloud script with (-e cloud) https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud is a good start to setup SolrCloud but it is limited to setup on single machine with Solr running on different ports.
Below are the steps to setup Zookeeper & Solr Nodes.
Zookeeper Setup on Node 1/ Machine 1
- Download Zookeeper 3.4.6 and extracts on Node 1 / machine 1 under a directory say solrcloud.
- Let’s first setup Zookeeper. Go inside <ZK_HOME>/conf directory
- Make a copy of zoo_sample.cfg & rename to zoo.cfg (or mv zoo_sample.cfg to zoo.cfg)
- Edit zoo.cfg and modify data_dir parameter to a directory location where you would like Zookeeper to store its data (Similar to below where created data directory under conf for simplicity)
- Now start Zookeeper with command
- ./bin/zkServer.sh start
- That’s all needs to be done for ZK at this moment.
Solr Setup on Node 1 / Machine 1
- Download solr-5.x.x & extract under directory solrcloud as shown above
- Now lets create solr home directory for our setup. You can choose any location but in below example i am using solr5.x.x/server/solr/node1/solr/.
- Copy default zoo.cfg & solr.xml from solr5.x.x/server/solr to solr5.x.x/server/solr/node1/solr/
- Now lets start Solr using below command (basically you want to start in cloud mode with Zookeeper)
./bin/solr start -cloud -s solr-5.x.x/server/solr/node1/solr -p 8983 -z <Node1 IP>:2181 -m 2g
Solr Setup on Node 2 / Machine 2
Follow exactly similar steps as above on node 2 / machine 2 except
- we don’t need to download & setup zookeeper since its already been setup on Node 1 &
- we need to point to <Node 1 IP> not node 2 when starting Solr in cloud mode. To be clear run this command
- ./bin/solr start -cloud -s solr-5.x.x/server/solr/node1/solr -p 8983 -z <Node1 IP>:2181 -m 2g
Upload configs to Zookeeper
- Before we upload the configuration files to ZK, lets create a directory to hold them on Node1.
- Assuming you are aware of “configsets”, lets keep your solr conf files under configsets directory for better management. So copy your files to solr5.x.x/server/solr/configsets/<renametoyourname_configs>/conf
- Since we want to have Zookeeper manage our configuration, lets upload our configs to Zookeeper by below command. (NOTE: don’t confuse zkCli below with zkCli found under ZK_HOME/bin directory. Both are different)
- ./server/scripts/cloud-scripts/zkcli.sh -zkhost <Node1 IP>:2181 -cmd upconfig -confname renametoyourname_configs -confdir solr5.x.x/server/solr/configsets/<renametoyourname_configs>/conf
Creating a collection
Now the final step is to create a collection using below command
http://<Node1 IP>:8983/solr/admin/collections?action=CREATE&name=<myCollection>&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName= renametoyourname_configs (Remember whatever name you used to upload configs to Zookeeper, please use same name here)
Go to browser & launch http://<Node1 IP>:8983/solr/#/~cloud & you shall see cloud up & running.
You will see the data getting distributed between shards once you ingest data. For querying data on SolrCloud, i hope you can follow this article https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
Please free free to contact me in case of any queries/questions/corrections.