🏡 Back Home 🔎 Search

Partitioning

Summary

Data partitioning is a technique to break up a big database (DB) into many smaller parts. It is the process of splitting up a DB/table across multiple machines to improve the manageability, performance, availability, and load balancing of an application.

Partitioning Methods

Data Sharding - In this scheme, we put different rows into different tables. This is also called a range based partitioning as we are storing different ranges of data in separate tables.
Vertical Partitioning - In this scheme, we divide our data to store tables related to a specific feature in their own server.
Directory Based Partitioning - A loosely coupled approach to work around issues mentioned in the above schemes is to create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.

Partitioning Criteria

Key or Hash-based partitioning: Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be ID % 100, which will give us the server number where we can store/read that record. Better way is Consistent hashing
List partitioning: put data into some category like: APAC, EMEA, etc.
Round-robin partitioning: This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).
Composite partitioning

Common Problems of Data Partitioning

Joins and Denormalization
Rebalancing

Partitioning

Partitioning

Partitioning Methods

Partitioning Criteria

Common Problems of Data Partitioning

Graph

Links to this page