Sharding in MongoDB

Overview

MongoDB is a NoSQL document database system that spreads horizontally and uses a key-value store to store data. Sharding is a method for distributing data across multiple machines so that the performance, and data availability of the system. In this article, we will learn how to use sharding in MongoDB.

What is Sharding?

When the amount of data in a database increases to the point where it begins to affect the performance of an application, database sharding is a valuable database design pattern to use.

The process of dividing a sizable database into numerous databases and storing them on various machines is known as database sharding. Sharding is a procedure that can improve performance, speed up query responses, and allow the team to grow without affecting the application's availability.

Sharding is often useful when:

The amount of data increases, making it unmanageable to keep it all in a single database.
It is simpler to divide and scale the data across various dedicated servers because of the write/read requests' volume and reaction times range.
All traffic is directed to a single database server, which negatively affects the program's performance and causes timeouts.

The ability to handle the storage and computing needs of storing and querying large volumes of data is the primary reason most organizations use a NoSQL database. One way to think of MongoDB's approach to handling large amounts of data is through sharding in Mongodb. It is the process by which massive datasets are divided into manageable ones and kept across numerous MongoDB instances. This is done because running queries on big datasets might result in the MongoDB Server using much CPU.

A single machine might not be able to store all of the data or provide a good read-and-write throughput as the size of the data grows. Sharding in Mongodb solves the issue using horizontal scaling. With sharding, you can increase the number of machines to handle the demands of read and write processes as well as data growth.

Because it allows you to scale horizontally, it is helpful when no single machine can manage heavy modern workloads. Adding machines to share the data collection and load is referred to as horizontal scaling, also referred to as scale-out. Big data and demanding tasks can be handled with almost limitless horizontal scaling.

The Architecture of Sharding

By setting up a cluster of MongoDB instances, sharding in Mongodb can be used. The next picture demonstrates how MongoDB Sharding functions in a Cluster.

The three main components of Sharded Cluster are as follows :

Shard
Config Servers
Query Routers

shard-mongodb1

Shard: The most fundamental component of a shared cluster is the shard, which is used to keep a portion of the large dataset that needs to be divided. Shards are created to deliver high levels of data integrity and availability.
Setup Servers : The MongoDB Sharded Cluster's metadata is intended to be stored on Config Servers. This metadata contains details about which shard is used to keep which subset of data. User queries can be directed appropriately using this knowledge. Exactly 3 Config Servers are intended to be present in each Sharded Cluster.
Search Routers : Query routers, which are essentially just Mongo instances, communicate with client programs and send requests to the proper shard. The query router executes the operations, directs them to the appropriate shards, and then provides the clients with the findings. To spread out the burden of client requests, a sharded cluster may have more than one query router. One query router receives queries from a client. A sharded network typically has a large number of query routers.

Advantages of Sharding in MongoDB

Sharding in Mongodb gives you almost limitless flexibility when scaling your database to manage increased loads. By boosting read/write throughput and storage capability, it achieves this. Let's take a closer look at each of those:

Data Place : Zone Sharding makes it simple to build distributed databases to support geographically dispersed applications by enforcing data residency within particular areas through policies. One or more fragments may exist in each zone.
Storage capability increase: Similarly, you can expand total storage capacity by increasing the number of shards. Suppose a shard can store 4TB of info. Your total storage would grow by 4TB for each additional. As a result, storage space is nearly infinite.
read/write rate has increased: The data set can be split among several shards to take benefit of parallelism. Let's assume that each shard can handle 1,000 tasks per second. The throughput would increase by one thousand operations per second for every extra shard.

MongoDB Sharding Data Distribution

Shard Key : Shards of MongoDB at the collection level. Which collection(s) you want to split is entirely up to you. The shard key is used by MongoDB to spread a collection's documents among shards. MongoDB divides the range of shard key values into non-overlapping ranges and then "chunks" the data into these categories. Then MongoDB makes an effort to equally distribute those chunks among the cluster's shards.

Shard keys are determined by fields found in every document. According to the shard ranges and the number of chunks, the values in those fields determine which shard the document will live in. The configuration server replica set contains and maintains this info.

The performance of the cluster is directly impacted by the shard key, so caution should be taken when selecting it. The performance or scaling problems can result from an inefficient shard key's uneven chunk allocation. By altering your shard key, you can always modify your data distribution plan. To select the ideal shard key for you, consult the accompanying documentation.

To guarantee that each shard has the same amount of chunks at all times, a background function known as the balancer automatically migrates chunks across the shards.

Sharding Strategy :

MongoDB supports two sharding strategies for distributing data across sharded clusters :

Hashed Sharding : The process of hashing includes calculating the value of the shard key field. The hashed shard key values are used to give each chunk a range. Although a group of shard keys might be "close," it's doubtful that their hashed values would be on the same chunk. In particular for data sets where the shard key varies monotonically, data distribution based on hashed values makes distribution more even. Hashed sharding in Mongodb, however, does not offer effective range-based processes.

shard-mongodb2

Ranged sharding: Based on the values of the shard key, ranged sharding in Mongodb splits data into ranges. Then, depending on the shard key values, a range is given to each chunk. The likelihood of several shard keys with "close" values being present on the same block increases. As a result, operations can be directed to only the fragments that hold the necessary data, enabling targeted operations.

shard-mongodb3

Implementation of Sharding in MongoDB

The Database-as-a-Service that makes sharded cluster implementation simple, MongoDB Atlas, is the most practical, cost-efficient, and simple method to deploy and manage a sharded cluster.

Note: Use of MongoDB Atlas is adviced as it is a great option for those who don't have the time or resources to manage all the infrastructure. Also, you can deploy your sharded cluster in minutes with a few simple clicks.

Config Server Setup

Up to 50 mongodb processes can be included in each config server replica group, with the following restrictions: no zero-priority and no arbiters members. Though it is not advised to use your local machine for sharding as it requires multiple nodes/machines. It is better to use a mongo atlas or some cloud service like Amazon EC2 for the best performance. For every one of those, you will need to start with the --configsvr option. For example :

Connect to just one of the replica group members from there :

Additionally, execute rs.initiate() on just one replica group member

We can make the shards once the configuration server replica is set up and operational.

Set Up Shards

Each shard is a replica set in and of itself, as was already stated. This procedure will be identical to the config servers, but with the --shardsvr option. Make careful to give each shard a unique replica set name.

Connect to just one of the replica group members from there:

and execute rs.initiate() on a single member of the replica group. Make sure the `--configure option is not used :

Starting the Mongos

Last but not least, configure mongosand to look at your configuration servers replica set:

To prevent a bottleneck in a production setting, more than one mongos must be deployed. Typically, it's a good idea to launch at least three mongos servers.

Turn on Sharding for Database after Configuration

Link up your MongoDB:

Add your pieces to the cluster after that. For each fragment, repeat this once:

Enable sharding in mongodb on your database:

Finally, use the sh.shardCollection() function to shard your collection. This can be accomplished in one of two ways: hashed sharding, which equally distributes your data across shards;

You can also optimize distribution across shards using range-based sharding in mongodb, which is dependent on the values of the shard key. This will increase the efficiency of queries across various data groups for some sets of data. The instruction is as follows:

FAQs

Q: What is sharding in mongodb? A: The process of dividing a sizable database into numerous databases and storing them on various machines is known as database sharding.

Q: What are the three main components of the sharded cluster in mongoDB? A: The three main components of Sharded Cluster are :

Shard
Config Servers
Query Routers

Conclusion

The process of dividing a sizable database into numerous databases and storing them on various machines is known as database sharding.
The three main components of Sharded Cluster are as follows : Shard, Config Servers, and Query Routers.
Advantages of Sharding in MongoDB are Data place, Storage capability increase, and read/write rate has increased.
There are two sharding strategies namely Hashed sharding,ranged sharding.