How do SQL and NoSQL Databases Scale?

Learn via video course
FREE
View all courses
DBMS Course - Master the Fundamentals and Advanced Concepts
DBMS Course - Master the Fundamentals and Advanced Concepts
by Srikanth Varma
1000
5
Start Learning
DBMS Course - Master the Fundamentals and Advanced Concepts
DBMS Course - Master the Fundamentals and Advanced Concepts
by Srikanth Varma
1000
5
Start Learning
Topics Covered

When data increases in the database, scaling becomes an important factor so as to how SQL and nosql database scale and data load. The techniques and principle for Non-SQL and SQL database for scaling is different. In databases, there are two types of scaling that are being used widely. Horizontal scaling and Vertical scaling.

Vertical Scaling:

This type of scaling mainly focuses on improving the processing power of the server by increasing the capacity of the hardware of computers like the CPU and the RAM. This is certainly a better option but there is a threshold for hardware scalability and it becomes expensive when scaled at a larger level in SQL.

vertical-scaling

Horizontal Scaling:

This is done by increasing the number of servers parallelly. It is the most preferred option while scaling the databases. Since the servers are distributed, more data can be stored.

horizontal-scaling

Scaling SQL Database

Sharding of Database:

Database sharding is a key component for scaling both SQL and Non-SQL databases. According to the name, the database is sliced up into multiple pieces which are also called shards. Each of the shards has a unique index that corresponds to the type of data that is to be stored. Check the example section to understand this concept.

Cluster Proxy

Now that the logic of sharding is applied to the SQL database, how would the query service know which shard to communicate for data? This is where the middleman, cluster proxy is used. The proxy is placed between the database and the query service, which acts like a load balancer. It keeps the track of shards along with their indices. Due to this, the load balancer exactly knows which database to shard.

cluster-proxy

Shard Proxy:

It is done for better scanning of the database. It resides between the load balancer and the shard. If the shard is large, then shard the proxy again for speeding up the queries. This process in which the sharding of proxy is done repeatedly is called hierarchical sharding. The queries can also be improved by frequently caching the user's data, monitoring the health of the database, metrics of data, etc.

shard-proxy

Availability of Data:

What happens when one instance of the shard goes down? The data is not available during that case. Well, in a real scenario, this should not be the case. The data should still be available when one instance goes down which means a backup should be available. This can be done using master-slave architecture. In this, the master shard is read/write whereas the slave shards are read-only to maintain the originality of the data. Whenever a write operation is executed, it is performed directly on the master shard whereas for performing a read operation, the queries can be directed to the slave shard in order to reduce the load on the master shard. In case the master shard is unavailable, the slave shard can replace the master shard.

Consistency of Data:

Before designing the database, the ACID property is taken into consideration. It means the Atomicity, Consistency, Isolation, and Durability of the data in a database. In a distributed system, consistency is the main issue. As per the master-slave architecture, some amount of time will be taken for replicating the data from master to slave. During this time duration, the states of both master and slave are different.

Scaling NoSQL Database

Cassandra Wide Column Database:

NoSQL databases are designed considering scalability in mind as compared to SQL data. In No-SQL, the scalability is done automatically which is invisible to the user.

Shards/nodes are equal:

Just like sharding is done on SQL databases, the No-SQL database already has shards that correspond to different indices. The key difference between both of them is that the No-SQL has equal shards as compared to master-slave architecture. The shards are also known as nodes. For sharing of information between the nodes, gossip protocol is used. It is given this name since the information is transmitted from one node to another.

Availability:

In No-SQL, since the shards are of equal length, the data can easily be replicated into another shard. In a distributed system, the replication of data is done asynchronously and consumes a lot of time. Hence, instead of waiting for all the shards to respond, only the shard that is needed should respond that that interval of time. Multiple copies of clusters are also stored across different data centers in order to avoid a single point of failure.

Example to Scale SQL and NoSQL Database

example-scale-sql-and-nosql-database

SQL

For instance consider the diagram, if the sharding of the database is done by using the parameter name then, shard A can store the name from A-F, and shard B will store the name from G-M, etc. Whenever the search query is executed in the database, the index quickly looks for the shard that should be searched for that name instead of looking for it in the entire database.

No-SQL

The nodes are able to communicate with each other as nodes know where the data is contained. The communication between each node is fixed. Therefore the time complexity of this is O(n^2) where n is the number of nodes. When a query is to be executed, an algorithm like robin-round is used for identifying the shortest distance from request to node. The data gets forwarded to another node if the current node doesn’t have it. For instance, node A has the data for user F. Node D, on the other hand, can communicate this with node A since it knows where the data is stored and will redirect the request of data to node A.

Conclusion

  • Vertical Scaling: This type of scaling mainly focuses on improving the processing power of the server by increasing the capacity of the hardware of computers like the CPU and RAM.
  • Horizontal Scaling: This is done by increasing the number of servers parallelly.
  • database sharding is a key component of how SQL and NoSQL database scale. According to the name, the database is sliced up into multiple pieces which are also called shards. Each of the shards has a unique index that corresponds to the type of data that is to be stored.

Learn More