Setup MongoDB with sharding infrastructure

The landscape

Sharding is the strategy that MongoDB uses to horizentally scale the entire data store infrustructure and meet the demands of data,growth, the infrastructure can be demonstrated as image below:

sharding

Preparation

To simulate a production like environment, a minimum number of 7 physical/virtual servers are required, the role of them are:

3 for Mongo config servers
1 for Mongo shard server (mongos)
3 for a Mongo shard (Replica set), 1 primary and 2 secondary (or 1 secondary and 1 arbiter).

Note: all the machines above need have mongod installed, for installation, refer: Install MongoDB.

SetUp Steps

Concentrated steps of setting up sharded MongoDB are summarized below; a complete tutorial from MongoDB's official website can be found at: Deploy a Sharded Cluster.:

1. Setup Config server

sudo mkdir -p /data/configdb
sudo mongod --configsvr --dbpath /data/configdb --port 27019

2. Setup Mongo Shard server (mongos)

mongos --configdb [config_server_ip/host]

3. Setup Shard 1~N

In MangoDB production deployment, a "shard" should be a replica set, a typical "three member" replica sets architecture is like below:

"Three member" essentially are three mongod processes with heartbeat and full replication between each other, they can run on one physical machine (for development/test use) or on separate machines (real production), this infrastructure provides "additional fault tolerance and high availability".

Setup primary

Modify /etc/mongo.conf and ensure the following value are set correctly:

port = 27017

bind_ip = 0.0.0.0

dbpath = /media/mongo_volume/db/

fork = true

replSet = rs0
Start the mongod process:
mongod --config /etc/mongodb.conf
rs.initiate() # or explicitly: rs.initiate({_id: "rs1", members: [{_id: 0, host: "rs1-1"}]})
rs.conf()
rs.add("{secondary_host}")

Setup secondary

Modify /etc/mongo.conf and ensure the following value are set correctly:

port = 27017

bind_ip = 0.0.0.0

dbpath = /media/mongo_volume/db/

fork = true

replSet = rs0
Start the mongod process:
mongod --config /etc/mongodb.conf
rs.initiate()
rs.conf()

Setup arbiter

Arbiter is like a "guard process" and consumers no dedicated hardware resource, for out concrete case, I setup arbiter on the mongos server.

mkdir /var/lib/mongo/arb/
mongod --port 30000 --dbpath /var/lib/mongo/arb/ --replSet rs0 --smallfiles
Connect to the mongo shell of the primary and run:
rs.addArb("{arbiter_host}:30000")

Verification

rs.conf()

rs0:PRIMARY> rs.conf()
{

        "_id" : "rs0",

        "version" : 3,

        "members" : [

                {

                        "_id" : 0,

                        "host" : "rs1-2"

                },

                {

                        "_id" : 1,

                        "host" : "rs1-1"

                },

                {

                        "_id" : 2,

                        "host" : "mongos",

                        "arbiterOnly" : true

                }

        ]
}

rs.status()

rs0:PRIMARY> rs.status()
{

        "set" : "rs0",

        "date" : ISODate("2013-12-17T08:56:41Z"),

        "myState" : 1,

        "members" : [

                {

                        "_id" : 0,

                        "name" : "rs1-2",

                        "health" : 1,

                        "state" : 2,

                        "stateStr" : "SECONDARY",

                        "uptime" : 299,

                        "optime" : Timestamp(1387270069, 1),

                        "optimeDate" : ISODate("2013-12-17T08:47:49Z"),

                        "lastHeartbeat" : ISODate("2013-12-17T08:56:40Z"),

                        "lastHeartbeatRecv" : ISODate("2013-12-17T08:56:39Z"),

                        "pingMs" : 0,

                        "syncingTo" : "rs1-1"

                },

                {

                        "_id" : 1,

                        "name" : "rs1-1",

                        "health" : 1,

                        "state" : 1,

                        "stateStr" : "PRIMARY",

                        "uptime" : 317,

                        "optime" : Timestamp(1387270069, 1),

                        "optimeDate" : ISODate("2013-12-17T08:47:49Z"),

                        "self" : true

                },

                {

                        "_id" : 2,

                        "name" : "mongos",

                        "health" : 0,

                        "state" : 8,

                        "stateStr" : "(not reachable/healthy)",

                        "uptime" : 0,

                        "lastHeartbeat" : ISODate("2013-12-17T08:56:24Z"),

                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),

                        "pingMs" : 0

                }

        ],

        "ok" : 1
}

Trouble shooting

For experimental made mistakes, we can run the following command on mongo config server to "reset" the sharding status:

> use config
> db.shards.drop()

Official documentations

Caveats: In real production environment, a minimum of 3 replica-sets is required. With two nodes, it can get replication of data (i.e.: they will copy each other), but it will not get automatic fail-over or high availability, hence we should never run production data with less than 3 nodes!

After replica set are all setup we can add it to mongo sharding server as a "shard":

First we need make sure the local mongod process is running on each shard, then we can use mongo shell to connect to the mongos server and tell the mongos add each shard respectively:
mongo --host [mongos_server_ip/host]

> sh.addShard("[replSetName: e.g. rs0/[mongod_ip/port]") # e.g. sh.addShard("rs0/ For single mongod in development environment, sh.addShard("[shard_ip/host]:27017") #

> sh.enableSharding("[database]") # Sharding a database
> sh.shardCollection("[database].[collection]", shard-key-pattern) # Sharding a collection with a shard key

Concrete example of setting up a Mongo shard for "Collection_Name"
mongo --host mongos
sh.addShard("rs1-0")
sh.enableSharding("{Database name}")
sh.shardCollection("{Database name}.{Collection name}", {user_id:1})

# Verify the sharding status by running "sh.status()" in mongo shell:

sh.status()

mongos> sh.status()
--- Sharding Status --- 

  sharding version: {

        "_id" : 1,

        "version" : 3,

        "minCompatibleVersion" : 3,

        "currentVersion" : 4,

        "clusterId" : ObjectId("5264c5c753b45824c363f402")
}

  shards:

        {  "_id" : "shard0000",  "host" : "shard0" }

        {  "_id" : "shard0001",  "host" : "rs1-0" }

  databases:

        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }

        {  "_id" : "mongo_demo",  "partitioned" : false,  "primary" : "shard0000" }

        {  "_id" : "{Database name}",  "partitioned" : true,  "primary" : "shard0000" }

                {Database name}.EntendedProfiles

                        shard key: { "user_id" : 1 }

                        chunks:

                                shard0000       1

                        { "user_id" : { "$minKey" : 1 } } -->> { "user_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) 

                {Database name}.{Collection name}

                        shard key: { "user_id" : 1 }

                        chunks:

                                shard0000       1

                        { "user_id" : { "$minKey" : 1 } } -->> { "user_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)

Additional notes

Amazon EBS volume

If we are setting up shards on Amazon: we need mount the Amazon EBS (Elastic Block Storage) and use the EBS to store Mongo logs and DB data, run the following command on a shard instance:

sudo su
Fdisk -l #To examine all disk partitions, find the partition we want to mount
mkfs -t ext4 /dev/xvdf # Format the partition to ext4 file format (/dev/xvdf could be /dev/sdb, /dev/sdh etc)
mkdir /media/mongo_volume # Make the directory where we supposed to mount the partition and store MongoDB data
mount /dev/xvdf /media/mongo_volume

#Modify /etc/mongo.conf, ensure Mongo used the mounted partision to store logs and DB data
logpath=/media/mongo_volume/log/mongod.log
dbpath=/media/mongo_volume/db

Replica set recovery notes

There are periodically heartbeats between replica sets and mongos server, once a primary/secondary gets down, mongos will try to reconnect.

Thu Jan  9 02:00:13.308 [ReplicaSetMonitorWatcher] trying reconnect to rs1-2
Thu Jan  9 02:00:13.311 [ReplicaSetMonitorWatcher] reconnect rs1-2 failed couldn't connect to server rs1-2

Thu Jan  9 02:00:13.315 [Balancer] distributed lock 'balancer/ip-10-244-81-61:27017:1389060087:1804289383' unlocked.

Thu Jan  9 02:00:15.156 [WriteBackListener-rs1-2] WriteBackListener exception : socket exception [CONNECT_ERROR] for rs1-2
...
Thu Jan  9 02:00:28.341 [ReplicaSetMonitorWatcher] trying reconnect to rs1-2
Thu Jan  9 02:00:28.344 [ReplicaSetMonitorWatcher] reconnect rs1-2 ok

There are heartbeats between each member within a replica set, once a primary gets down, a election will occur automatically and one of them will become primary, the old primary will join back as secondary once recover:

Thu Jan  9 02:09:34.825 [rsHealthPoll] replset info rs1-0 heartbeat failed, retrying
Thu Jan  9 02:09:34.826 [rsHealthPoll] replSet info rs1-0 is down (or slow to respond):

Thu Jan  9 02:09:34.826 [rsHealthPoll] replSet member rs1-0 is now in state DOWN
Thu Jan  9 02:09:35.746 [rsMgr] replSet info electSelf 1
Thu Jan  9 02:09:36.378 [rsMgr] replSet PRIMARY

Wayne Ye