Like any company, we also have some legacy codes. Our codes were using Solr 3 and I was going to upgrade it to the latest (6.1). The upgrading itself is not such a big deal, just fire up a new setup convert the old schema type to the new schema type which only differs in XML formats. I am not going through that as you can easily get sample schema format from latest version and just compare it to your schema. Once done you can start the new solr with your old schema and it will start giving errors!! but with patience and hard work you can resolve them one by one.
Anyway, the upgrade process is not such a big deal but working with new solr is. Specially if you want to use the cloud version which uses zookeeper to manage the configs, shards, replications, leaders and etc. All you might come on your way is some depreciated class or missing class which you can download.
In my case I found this page very useful to find the deprecated classes of Solr 3.6.
Before I jump on Solr cloud 6.1 you may need to know some concepts:
- Collection: A single search index.
- Shard: A logical section of a single collection (also called Slice). Sometimes people will talk about “Shard” in a physical sense (a manifestation of a logical shard).
Shard is literally the parts of your data. It means if you have 3 shards then all your data (documents) are distributed in 3 parts. It also means if one of the shards is missing then you are in trouble!!
- Replica: A physical manifestation of a logical Shard, implemented as a single Lucene index on a SolrCore.
Replica is the replication of the shards! so if you have replication factor of 2 then you will have 2 copy of each shard.
- Leader: One Replica of every Shard will be designated as a Leader to coordinate indexing for that Shard.
Leader is the master node in a shard. So if you have to replicas, then the master one is the boss!
- SolrCore: Encapsulates a single physical index. One or more make up logical shards (or slices) which make up a collection.
- Node: A single instance of Solr. A single Solr instance can have multiple SolrCores that can be part of any number of collections.
- Cluster: All of the nodes you are using to host SolrCores.
In continue, I will go through installing and using this whole setup.
OK! So when it comes to Solr cloud you actually don’t have to deal with Solr master and slave anymore. Instead you need to deal with Solr cloud! For sake of this example we will design a very basic setup as follow:
- Zookeeper: There will be 3 zookeeper nodes with instance type t2.micro as this is just test. It actually is possible to run with 1 zookeeper (for test only) or run all 3 zookeepers in one instance, but I decide to go with 1 zookeeper in each instance.
- Shards: There will be 2 shards which means our data will be divided in 2.
- Replications: There will be 2 replica for each shard which means we have 2 copy of data in each shard.
- Solr Nodes: Number Shards in number of replicas equals 4 so we will have 4 t2.medium instances.
I will proceed to zookeeper first:
wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz tar xvfz zookeeper-3.4.6.tar.gz mv zookeeper-3.4.6 /opt/zookeeper cd /opt/zookeeper/conf cp zoo_sample.cfg zoo.cfg mkdir /var/zookeeper/
The zookeeper is pretty simple, it only have a small config file (nothing scary!) and a little script to start it. Please update the config file with Appendix 1 and run the code with zkServer.sh
vi /opt/zookeeper/conf/zoo.cfg cat 1> /var/zookeeper/myid /opt/zookeeper/bin/zkServer.sh start-foreground
Please note that in my sample I just put id as 1 in both /var/zookeeper/myid and /opt/zookeeper/conf/zoo.cfg but you will need to update the id for other servers.
Once the zookeeper successfully run in foreground you can run it in background:
Then we proceed to Solr:
wget wget http://www-eu.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.zip tar xvfz solr-6.1.0.zip unzip solr-6.1.0.zip mv solr-6.1.0 /solr
Up to here, there was nothing complicated but at this point you may face some troubles. The main difference of Solr cloud and traditional Solr is that you DO NOT COPY THE CONFIGS in Solr folder; you just upload them to zookeeper and it will take care of distribution.
So the process is like this:
- You create a collection (number of shards and replica defined here)
- Later if you need to change the configs, you simply send them to zookeeper
At this moment I will start the Solr first (which does not include any collection). Just for sake of avoiding long commands and scare you I use __ZK instead of list of IP:Port:
__ZK = "192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181" /solr/bin/solr start -c -z $__ZK
Then you can create a collection like this, assuming all solr configs (schema.xml, solrconfig.xml and etc) are inside /root/solr-configs/abcde/ folder.
/solr/bin/solr create_collection -c abcde -shards 2 -replicationFactor 2 -d /root/solr-configs/abcde/
I case you need update the configs, you can delete the collection and create again or be smart and just update the zookeeper as I told you before:
__ZK = "192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181" /solr/server/scripts/cloud-scripts/zkcli.sh cmd upconfig -confname abcde -confdir /root/solr-configs/abcde/ -zkhost $__ZK
tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.1.10:2888:3888 server.2=192.168.1.11:2888:3888 server.3=192.168.1.12:2888:3888