today I want to talk about dividing your data.
People use databases to have a quick access to their data, but with time, the database grows, and it takes longer to extract the information. Data operations become a bottleneck.
Figure 1: Example of a database
What’s the best way to partition data?
There are mainly three ways to partition your data.
This type of partition divides the table vertically, which means that the structure of the main table changes in the new ones. An ideal scenario for this type of partition is when you don’t need all the information about the customer in your query. Let’s say if you only need orders from the current year, you could split it into two databases, one would hold customer information and current purchases, and the other would hold data about purchases from previous years. See Figure 2 for visual representation of a different scenario, where the user doesn’t need to see the short description of a product.
Figure 2: Example of vertical partitioning
Another example where vertical partitioning is a great option is when you have different types of data in your database, such as names, dates, and pictures. You could keep the string values in SQL DB, and pictures in an Azure Blob. This would save you money because SQL is expensive and Blob storage is one of the cheapest storage options.
Horizontal partitioning – also known as sharding
Here, you partition a table by rows. For example, if you have a large database of customers, you could divide it into four new tables: A – G, H – N, O – U, V – Z. However, this might not be the best optimization for you because, for instance, you have twice as many customers from the range O – U than from A – G. When partitioning your data, you need to assess the number of rows in the new tables, so each table has the same number of customers and will grow by a similar number of new customers in the future.
You might also partition the dataset based on the recent customers, for example, the clients that are not being active at your store are stored in one DB. In addition, the active customer database might be split into more tables, to get the results faster.
The structure of the original table stays the same in the new tables, i.e., we have the same number of columns. See Figure 3 for visual representation of the partition.
Figure 3: Example of horizontal partitioning
This division combines vertical and horizontal partitioning. If you have a large dataset where you keep different types of data, you could horizontally partition the customer information and vertically divide the database into string values based on your criteria in a SQL DB, and pictures could be stored in Blob storage. See Figure 4 for visual representation of the partition.
Figure 4: Example of hybrid partitioning
So, to pick the best option for you, you need to analyze the type and the structure of the data you have.
As an alternative to data partitioning, you could use Big Data that would do the sharding for you, but BD has other restrictions that you need to be aware of, but this is out of scope of this article.
Since my database initially doesn’t have many entries, I don’t need to partition the data horizontally. However, I will use vertical partitioning to put product pictures in the Azure Blob storage. The rest of the product information will be in SQL DB.
“Partitioning.” Microsoft TechNet. N.p., n.d. Web. 26 Mar. 2017.
Dragon119. “Data partitioning guidance.” Guidance | Microsoft Docs. N.p., n.d. Web. 26 Mar. 2017.