A Practical Guide to Table Partitioning in Azure SQL Database

As this post is more about the practical stuff so assuming that you are familiar with partitioning’s basic concepts and let’s quickly jump to the sequence of steps and some key points –

First step is to create a Partition Function that defines the boundaries of each partition. In the following example, partitions are defined for months and for four years starting with 2020 and until end of year 2023

CREATE PARTITION FUNCTION PF_MonthlyPartition (INT)
AS RANGE RIGHT FOR VALUES 
(
	  202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010, 202011, 202012
	, 202101, 202102, 202103, 202104, 202105, 202106, 202107, 202108, 202109, 202110, 202111, 202112
	, 202201, 202202, 202203, 202204, 202205, 202206, 202207, 202208, 202209, 202210, 202211, 202212
	, 202301, 202302, 202303, 202304, 202305, 202306, 202307, 202308, 202309, 202310, 202311, 202312
)
GO

There are three take aways from above code –

  1. Partitioning parameter is of INT type; you can have it as Date, DateTime too
  2. RANGE RIGHT indicates where the boundary specified needs to be included – in the left partition or the partition right to the value
  3. Third and the most important one – 48 partitions are defined here but note that these are partition boundaries and in actual there would be 49 partitions. First partition will store the values falling on left side of the first boundary and the last one will be going upto infinite

Next step is to define the Partition Scheme based on the Partition Function which tells the storage area/Filegroup where partitions needs to be stored –

CREATE PARTITION SCHEME PS_MonthWise
AS PARTITION PF_MonthlyPartition
ALL TO ([PRIMARY])

In Azure SQL Database, there is only one file group so it is mentioned as ALL to PRIMARY here. However, an ideal way is to map different partitions on a different file group but this is also fine in our case.

After having the Partition Function and Partition Scheme ready we can create the table as per the partition function specified. Let’s take a look at the example below –

DROP TABLE IF EXISTS dbo.PF_Test
GO
CREATE TABLE dbo.PF_Test
(
	  ID		INT				IDENTITY(1,1)
	, txt		VARCHAR(100)
	, PF_Date	datetime2(7)
	, PF_Month	AS CAST(CONCAT(Year(PF_Date), RIGHT('00' + CAST(Month(PF_Date) AS VARCHAR), 2)) AS INT) PERSISTED
	, CONSTRAINT PK  Primary Key CLUSTERED (ID, PF_Month) 
) ON PS_MonthWise(PF_Month)
GO

Take aways from the above code are –

  1. As we created our Partition Function based on INT parameter so here we have created a computed column based on the PF_Date which is basically the column to drive the data segregation based on month…and this computed column is generated in the same format as defined in the Partition Function – YYYYMM
  2. Computed column is made as PERSISTED here
  3. Most important pre-requisite about the partitioning column is that this needs to be part of Clustered Index so having an IDENTITY column here, we added our computed column PF_Month also to have a composite CLUSTERED INDEX on them together. This makes some sense too as partition is about the organization/grouping of related data together and CLUSTERED INDEX also guides the order in which data is stored so both of them needs to be in sync
  4. Finally, on the last line, it is mentioned where the table needs to be created…just like we specify the file group using ON clause, Partitioning Scheme is mentioned here which indirectly indicates the filegroup behind the partitions along with the column of table which is used to partition the data

This is all in place now and you can start adding some dummy data to see this in action –

INSERT INTO dbo.PF_Test (txt, PF_Date) VALUES('A', '2020-07-22')
INSERT INTO dbo.PF_Test (txt, PF_Date) VALUES('A', '2024-07-22')

You can verify which partition these records went into by running the following statement –

SELECT p.object_id, o.name, p.partition_number, p.rows, * 
FROM sys.partitions p
INNER JOIN sys.objects o
	ON p.object_id = o.object_id
INNER JOIN sys.indexes i
	ON p.object_id = i.object_id
	AND p.index_id = i.index_id
WHERE p.object_id = OBJECT_ID('dbo.PF_Test')
ORDER BY p.partition_number

You can see that first record went into partition number 8 based on its value and the boundary defined by the partitioning function –

You can query catalog views to look at the details of different partitioning approaches applied –

SELECT * FROM sys.partition_functions

SELECT * FROM sys.partition_schemes

SELECT * FROM sys.partition_parameters -- specifies the parameter type which was INT in our case

SELECT * FROM sys.partition_range_values

One of the best use case of partitioning in data world is to quickly move data from one table to another instantly specially when the data volumes are high.

Partitions can be switched from one table to another by running the following statement. However, there are some requisites for this to run successfully –

  1. Both the tables needs to have the same schema including the indexes
  2. Both tables needs to have same number of partitions
  3. Switching a partition requires the target partition to be empty
-- e.g. we created another table with the same definition
-- and on the same partition scheme

DROP TABLE IF EXISTS PF_Test_Ar
GO
CREATE TABLE PF_Test_Ar
(
	  ID		INT				IDENTITY(1,1)
	, txt		VARCHAR(100)
	, PF_Date	datetime2(7)
	, PF_Month	AS CAST(CONCAT(Year(PF_Date), RIGHT('00' + CAST(Month(PF_Date) AS VARCHAR), 2)) AS INT) PERSISTED
	, CONSTRAINT PK_PF_Test_Ar  Primary Key CLUSTERED (ID, PF_Month) 
) ON PS_MonthWise(PF_Month)
GO

-- and then we switched the partition 8 from original source table to this one which had one row

ALTER TABLE PF_Test Switch Partition 8 TO PF_Test_Ar Partition 8

-- You can now query both the tables; you will see the data corresponding to partition 8 has been moved to the new table

SELECT * FROM PF_Test
SELECT * FROM PF_Test_Ar
Advertisements