Data Base Sharding In django using MySQL
Hi I have been trying to improve the db performance and had done some basic research regarding having a db partition and db sharding and also having 2 dbs one for write and other for read .
However i found out that the db sharding is the best way out of all as the mapping provided by sharding is dynamic that is one of the requirement to put it bluntly i have provided the 2 cases below
Case 1:- we need to get all the transaction of a user (which is huge) Case 2:- we need all the data for a particular time interval for all the user (which is again huge)
Because of the above scenerios I'm looking to implement db sharding
Note:- I have already segregated some db into multiple databases already and they sit on different machines so i want it to be applied to all those multiple databases
What I'm Looking for :
- Any link that could be helpful
- Any snippet code that could be helpful
Django==3.2.13 MySql == 5.7
Let me define some terms so that were are "on the same page":
Replication or Clustering -- Multiple servers having identical datasets. They are kept in sync by automatically transferring all writes from one server to the others. One main use is for scaling reads; it allows many more clients to connect simultaneously.
PARTITION -- This splits one table into several, based on date or something else. This is done in a single instance of MySQL. There are many myths about performance. The main valid use is for purging old data in a huge dataset.
Sharding -- This involves splitting up a dataset across multiple servers. A typical case is splitting by user_id (or some other column in the data). The use case is to scale writes. (On pure MySQL, the developer has to develop a lot of code to implement Sharding. There are add-ons, especially in MariaDB, that help.)
Your use case
Your "2 dbs one for write and other for read" sounds like Replication with 2 servers. It may not give you as much benefit as you hope for.
You are talking about
SELECTs that return millions of rows. None of the above inherently benefits such, even if you have several simultaneous connections doing such.
Please provide some numbers -- RAM size, setting of
innodb_buffer_pool_size, and dataset size (in GB) of the big
SELECTs. With those numbers, I can discuss "caching" and I/O and performance. Performing multiple queries on the same dataset may benefit from caching on a single server.
Replication and Sharding cannot share the caching; Partitioning has essentially no impact. That is, I will try to dissuade you from embarking on a technique that won't help and could hurt.
Please further describe your task; maybe one of the techniques will help.
P.S., Replication, Partitioning, and Sharding are mostly orthogonal. That is any combination of them can be put together. (But rarely is.)