What is Database Sharding?

Database sharding is a method for horizontal scaling of databases, where the data is split across multiple database instances, called as shards. Sharding sometimes called as data partitioning. Sharding helps to improve performance and scalability.

Advantages:

  • Scalability
  • High Availability
  • Improved Read Response Time
  • Better Write Throughput

Disadvantages:

  • Data Rebalancing
  • Costly Join Operations
  • Overhead during Schema Change
  • Referential Integrity can go on a toss

Type of Sharding:

  • Range based or horizontal:
    • Ex: A-H, I-P etc.
    • Good for static or near static data
    • Data Rebalancing problem may come in future
  • Vertical (Not to confuse with vertical scaling):
    • Adding domain/features wise shards
    • Ex: User data in in shard1, Post in shard 2 etc.
    • Good for having shard based policies
    • Re-shard at a later point of time due to features enhancement may lead to migration and downtime (availability)
  • Key (or Hash) based:
    • Hash function such as Modulo (mod) can be used to allocate shard, such as: h(shard key % number of shards)
    • Ex: , h(1%3) = shard 1, h(2%3) = shard 2, h(4%3) = shard 1
    • In case of new shards added, hash function may need to be changed and that can lead to migration and downtime (availability).
    • Dynamic shards allocation can be handled using consistent hashing
  • Directory based:
    • Use look up table to keep shard key and shard instance
    • Can use hash or any other approach to identify shard instance
    • Shard key and instance mapping needs to be stored in lookup table
    • In case of new shards, their mapping can be added in a new lookup table and data mapping from old lookup can be migrated to new lookup table and old table can be deleted. This approach avoids the downtime due to migration.
    • The downside of it is- additional latency due to lookup server and it being the single point of failure (can be addressed using master-slave approach).

When to use sharding?

  • Single database can’t handle the load
  • Benefits supersedes the added complexity
  • Team is well verse with features/skills
  • Ex1: Geo specific data can be added in individual shards such as Uber is keeping India specific data into one shard and USA data into another.
  • Ex2: Instagram/Linked keeping closely connected friends into one shard for faster data retrieval. In this case user id cab be the shard key.

Hope you liked the blog. Look forward to your comments/suggestion below.


Posted

in

by

Comments

Leave a comment