🏡 Back Home
🔎 Search
Partitioning
Partitioning
Summary
Data partitioning is a technique to break up a big database (DB) into many smaller parts. It is the process of splitting up a DB/table across multiple machines to improve the manageability, performance, availability, and load balancing of an application.
Partitioning Methods
- Data Sharding - In this scheme, we put different rows into different tables. This is also called a range based partitioning as we are storing different ranges of data in separate tables.
- Vertical Partitioning - In this scheme, we divide our data to store tables related to a specific feature in their own server.
- Directory Based Partitioning - A loosely coupled approach to work around issues mentioned in the above schemes is to create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.
Partitioning Criteria
- Key or Hash-based partitioning: Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be
ID % 100, which will give us the server number where we can store/read that record. Better way is Consistent hashing
- List partitioning: put data into some category like: APAC, EMEA, etc.
- Round-robin partitioning: This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).
- Composite partitioning
Common Problems of Data Partitioning
- Joins and Denormalization
- Rebalancing