What is Database Sharding?

Database sharding is a method for horizontal scaling of databases, where the data is split across multiple database instances, called as shards. Sharding sometimes called as data partitioning. Sharding helps to improve performance and scalability.

Advantages:

Scalability
High Availability
Improved Read Response Time
Better Write Throughput

Disadvantages:

Data Rebalancing
Costly Join Operations
Overhead during Schema Change
Referential Integrity can go on a toss

Type of Sharding:

Range based or horizontal:
- Ex: A-H, I-P etc.
- Good for static or near static data
- Data Rebalancing problem may come in future
Vertical (Not to confuse with vertical scaling):
- Adding domain/features wise shards
- Ex: User data in in shard1, Post in shard 2 etc.
- Good for having shard based policies
- Re-shard at a later point of time due to features enhancement may lead to migration and downtime (availability)
Key (or Hash) based:
- Hash function such as Modulo (mod) can be used to allocate shard, such as: h(shard key % number of shards)
- Ex: , h(1%3) = shard 1, h(2%3) = shard 2, h(4%3) = shard 1
- In case of new shards added, hash function may need to be changed and that can lead to migration and downtime (availability).
- Dynamic shards allocation can be handled using consistent hashing
Directory based:
- Use look up table to keep shard key and shard instance
- Can use hash or any other approach to identify shard instance
- Shard key and instance mapping needs to be stored in lookup table
- In case of new shards, their mapping can be added in a new lookup table and data mapping from old lookup can be migrated to new lookup table and old table can be deleted. This approach avoids the downtime due to migration.
- The downside of it is- additional latency due to lookup server and it being the single point of failure (can be addressed using master-slave approach).

When to use sharding?

Single database can’t handle the load
Benefits supersedes the added complexity
Team is well verse with features/skills
Ex1: Geo specific data can be added in individual shards such as Uber is keeping India specific data into one shard and USA data into another.
Ex2: Instagram/Linked keeping closely connected friends into one shard for faster data retrieval. In this case user id cab be the shard key.

Hope you liked the blog. Look forward to your comments/suggestion below.

What is Database Sharding?

Comments

Leave a comment Cancel reply

What is Database Sharding?

Share this:

Comments

Leave a comment Cancel reply