SlideShare a Scribd company logo
Administering and Monitoring SolrCloud

Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ta me…
Sematext consultant & engineer
Solr.pl co-founder
Father and husband 
SolrCloud Concepts
Shard1
Replica

Shard2
Replica

Solr Server

Solr Server

Shard2

Shard1

Solr Server

Solr Server

Application
Local SolrCloud Cluster
java -Dbootstrap_confdir=./solr/revolution/conf
-Dcollection.configName=revolution -DzkRun -DnumShards=1 -jar
start.jar

Runs embedded ZooKeeper
Bootstraps collection with 1 shards
Starts Solr
Starting Solr Cluster
No Collection

No Collection

-DzkHost=192.168.1.1:2181,
192.168.1.2:2181,192.168.1.3:2181

Solr Server

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

No Collection

No Collection

-DzkHost=192.168.1.2:2181,
192.168.1.1:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

ZooKeeper

ZooKeeper

ZooKeeper

Solr Server
Uploading Collection Configuration
./zkcli.sh -cmd upconfig -zkhost 192.168.1.1:2181
-confdir ./conf/ -confname revolution

ZooKeeper

Collection configuration

ZooKeeper

ZooKeeper

Solr
Collections API
Create
Delete
Reload
Split
Create Alias
Delete Alias
Shard Creation/Deletion

http://wiki.apache.org/solr/SolrCloud
Collection Creation
curl 'http://solrhost:8983/solr/admin/collections?action=CREATE
&name=revolution&numShards=3&replicationFactor=4'

name
numShards
replicationFactor
maxShardsPerNode

createNodeSet
collection.configName
Collection Split Example

$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=collection1&numShards=2&replicationFactor=1'
Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections?
action=SPLITSHARD&collection=collection1&shard=shard1'
Getting Deeper – CoreAdmin API
curl 'http://solrhost:8983/solr/admin/cores?action=CREATE
&name=newcore&collection=revolution&shard=shard2'

collection
shard
numShards

collection.configName
Schema – the API
Reading (Solr 4.2)
Fields
Dynamic fields
Types
Copy fields
Name (4.3)
Version (4.3)
Unique Key (4.3)
Similarity (4.3)

Writing (Solr 4.4)
Adding new fields
Adding copy fields
Reading Your Schema
curl -XGET 'http://solrhost:8983/solr/rev/schema/fields/name'
{
"responseHeader" : {
"status" : 0,
"QTime" : 5 },
"field" : {
"name" : "name",
"type" : "text_general",
"indexed" : true,
"stored" : true }
}

Full reference: http://wiki.apache.org/solr/SchemaRESTAPI
Dynamic Schema Modifications
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
curl -XPUT 'http://solrhost:8983/solr/rev/schema/fields/content' –d
'{
"type" : "text",
"stored" : "false",
"copyFields" : ["catchAll"]
}'
curl -XPOST 'http://solrhost:8983/solr/rev/schema/copyFields' -d
'[
{
"source" : "name",
"dest" : [ "text", "personal" ]
}
]'
The Right Directory
StandardDirectory
SimpleFSDirectory

NIOFSDirectory
MMapDirectory

_0.fdt

_0.fdx _0.fnm _0.nvd

_1.fdt

_1.fdx _1.fnm _1.nvd

NRTCachingDirectory
RAMDirectory

<directoryFactory name="DirectoryFactory"
class="solr.NRTCachingDirectoryFactory" />
Segment Merging
Level 0

a

b

f

Level 1

c

c

d

e

g
Segment Merge Under Control
Merge policy
Merge scheduler

Merge factor
Merge policy configuration

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Autocommit or Not?
Automatic data flush (hard commit)

Automatic index view refresh

<autoCommit>
<maxTime>15000</maxTime>
<maxDocs>1000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Caches
Refreshed with IndexSearcher
Configurable

Different purposes
Different implementations

Solr Cache
Monitoring Importance
What to Pay Attention to?
Cluster State
Health
Shards and replica status
Shard placement
Failing nodes
Indexing Related Metrics
Index throughput
Document distribution

I/O subsystem metrics
Merging
Search - related Metrics
Count

Latency
Distribution among nodes

Anomalies and spikes
Monitoring Memory and GC
Heap details
Pool size
Pool utilization
Garbage collection count
Garbage collection time
Monitoring OS Related Metrics
CPU details
Load
I/O activity
Network usage
Solr Administration Panel
Solr & JMX
<jmx />
java -Dcom.sun.management.jmxremote –jar start.jar
Solr & JMX
SPM
Index statistics
Request # and latency
Caches and warmup
CPU
JVM Memory and OS Memory
Garbage collector
OS related statistics
SPM Dashboard
Other Monitoring Tools
Ganglia
http://ganglia.sourceforge.net/

New Relic
http://www.newrelic.com/

Opsview
http://www.opsview.com
Too much is too much
Too hot
Caches
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html
Thank You !
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
SPM discount code:

LR2013SPM20

@ Sematext booth ;)

More Related Content

Administering and Monitoring SolrCloud Clusters