Wednesday, February 28, 2024

PostgreSQL Partitioning Overview

 

                      PostgreSQL Partitioning Overview

 

Partitioning has made a lot of improvements in the Postgres database.

 

Overview


Quick introduction to partitioning and timeline of adding partitioning features to PG before we get into the enhancements done in PG-13. Partitioning is way of splitting or dividing a large table into smaller pieces, the table can be divided using the List, Range or Hash partitioning techniques offered by PG.

The parent partition table doesn’t store any data, the data is stored in the partitions defined when creating the partitioned table. While the partitions can be accessed directly, the queries are typically directed at the logical parent relation and the tuples are routed to the correct partition for inserts and update and in-case of a read query the desired partitions are scanned for executing the client query.

Here are some key partitioning feature made it to PG over the years…




 

Here I explain how to take advantage of this feature effectively. PostgreSQL partitioning features:


Partition pruning 

Partition-wise join

Partition-wise aggregation

Partition pruning

Partition pruning is a feature that narrows down the partitions to be accessed by SQLs, which in effect means you can exclude those that do not need to be scanned.


Partitioning always create based on a specific column as a key column.

For example, in the figure below, the sale_date column is set as a partition key.

By using the partition key in WHERE clause, the partitions to be accessed are narrowed down. Since the search is performed only for the targeted partitions, efficient scan is possible.

Note the following about partition pruning:

-  They are enabled by default.

-  They can be enabled/disabled by setting enable_partition_pruning in postgresql.conf or by using SET.

 

Partition pruning example:

Let`s see the effect of partition pruning by comparing the scan performance for partitioned and non-partitioned tables.

 

The sales table in the sample below employs a range partition, with sale_date as the partition key. You can see that the table is divided into 3 partitions, while nonpartition_sales is not a partition table.

 

--Partitioned table

mydb=# \d+ sales Table "public.sales" Column | Type |…

-----------+------------+

id | integer | p_name | text | amount | integer | sale_date | date |

Partition key: RANGE (sale_date)

Partitions: sales_2019_Q4 FOR VALUES FROM ('2019-10-01') TO ('2020-01-01'), sales_2020_Q1 FOR VALUES FROM ('2020-01-01') TO ('2020-04-01'), sales_2020_Q2 FOR VALUES FROM ('2020-04-01') TO ('2020-07-01')

 


mydb=# SELECT COUNT(*) FROM sales; count

3000000

(1 row)

--Non-partitioned table mydb=# \d+ nonpartition_sales

Table "public.nonpartition_sales" Column |                Type   |…

-----------+------------+

id | integer | p_name | text | amount | integer | sale_date | date |

 


mydb=# SELECT COUNT(*) FROM nonpartition_sales; count


3000000

(1 row)

 

--Comparing estimated execution times

Let`s check the execution plan in the non-partitioned table of a SELECT statement using the partition key (sale_date).

 

mydb=# EXPLAIN ANALYZE SELECT * FROM nonpartition_sales mydb-# WHERE sale_date < '2020-04-01' AND id=1;


QUERY PLAN

Gather

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on nonpartition_sales A

Filter: ((sale_date < '2020-04-01'::date) AND (id = 1)) Rows Removed by Filter: 986797

Planning Time: 0.249 ms

Execution Time: 3156.240 ms                                <<<<**** (8 rows)

--sequential scan is executed in the entire table

--Estimated execution time for non-partitioned table

Now let`s see how the plan changes when executing the same SQL for the partitioned table. mydb=# EXPLAIN ANALYZE SELECT * FROM sales


WHERE sale_date < '2020-04-01' AND id=1; QUERY PLAN

Gather

Workers Planned: 2

Workers Launched: 2

-> Parallel Append

-> Parallel Seq Scan on sales_2019_Q4 A

Filter: ((sale_date < '2020-04-01'::date) AND (id = 1)) Rows Removed by Filter: 292428

-> Parallel Seq Scan on sales_2020_Q1 A

Filter: ((sale_date < '2020-04-01'::date) AND (id = 1)) Rows Removed by Filter: 539199

Planning Time: 0.642 ms

Execution Time: 423.267 ms                                <<<<**** (12 rows)

--sequential scan is performed in the relevant partitions

--Estimated execution time for partitioned table

 

As we can see above, the execution time for the partitioned table is almost 7.5x faster (3156.240 ms | 423.267ms).

 

***Partition-wise join***

 

Partitionwise join currently applies only when the join conditions include all the partition keys,


which must be of the same data type and have one-to-one matching sets of child partitions. Because partitionwise join planning can use significantly more CPU time and memory during planning, the default is off .

Partition-wise join is a feature that joins partitions during a JOIN operation on partitioned tables.

 

Combining partitions that have the same range and values eliminates unnecessary JOIN processing, thus improving performance.

 

Suppouse that partitioning create these ranges separately: values 1 to 1000 as part1, 1001 to 2000 as part2, and 2001 to 3000 as part3.

 

Our JOIN condition is emp.id = emp_info.id, so the emp_1 and the emp_info_1 partitions are joined, since they meet the join condition

(each id column takes values from 1 to 1000).

 

However, the emp_1 and the emp_info_2 partitions do not join, since they have been partitioned with different values.

Other partitions created with the same partitioning range values are also joined, as illustrated below.

Note the following about partition-wise joins:

-They are disabled by default.

This is because a lot of CPU and memory may be consumed when creating execution plans. Verification is required whether they will be effective in your environment before using them.

 

-They can be enabled/disabled by setting enable_partitionwise_join in postgresql.conf or by using SET.

 

-Their performance may be even amplified by using parallel query - we will discuss this later in this article.

 

***Partition-wise join example***

Let`s compare the estimated times in execution plans where partition-wise joins are disabled and enabled.

 

The emp and emp_info tables in this example uses hash partitioning, with the id column set as the partition key.

Both tables are divided into 3 partitions under the same conditions.

 

***Table emp***

mydb=# \d+ emp Table "public.emp"

Column | Type |

--------+---------+

id | integer | name | text | dept | integer |

Partition key: HASH (id) Index:

"emp_pkey" PRIMARY KEY, btree (id)

Partition: emp_0 FOR VALUES WITH (modulus 3, remainder 0), emp_1 FOR VALUES WITH (modulus 3, remainder 1), emp_2 FOR VALUES WITH (modulus 3, remainder 2)

mydb=# SELECT COUNT (*) FROM emp;


count


3000000

(1 row)

 

***Table emp_info***

mydb=# \d+ emp_info Table "public.emp_info"

Column | Type |

--------+---------+-

id | integer | addr | text | rating | integer |

Partition key: HASH (id) Index:

"emp_pkey" PRIMARY KEY, btree (id)

Partition: emp_info_0 FOR VALUES WITH (modulus 3, remainder 0), emp_info_1 FOR VALUES WITH (modulus 3, remainder 1), emp_info_2 FOR VALUES WITH (modulus 3, remainder 2)

 


mydb=# SELECT COUNT (*) FROM emp_info; count

3000000

(1 row)

 

***Comparing estimated execution times***

Check that the partition-wise joins are disabled.

 


mydb=# SHOW enable_partitionwise_join; enable_partitionwise_join

off

(1 row)

 

Check the execution plan of a JOIN of the partitioned tables above, using id as the join key. The scanning results of partitions in the table are obtained separately, and the join process is performed at the end.

mydb=# EXPLAIN ANALYZE

mydb-# SELECT emp.id, emp.name, emp_info.rating mydb-# FROM emp LEFT OUTER JOIN emp_info mydb-#               ON emp.id = emp_info.id A


mydb-# WHERE emp_info.rating = 10; QUERY PLAN

Gather

Workers Planned: 2

Workers Launched: 2

-> Parallel Hash Join

Hash Cond: (emp_1.id = emp_info_1.id)

-> Parallel Append B

-> Parallel Seq Scan on emp_1

-> Parallel Seq Scan on emp_0

-> Parallel Seq Scan on emp_2

-> Parallel Hash

-> Parallel Append C

-> Parallel Seq Scan on emp_info_1 Filter: (rating = 10)

Rows Removed by Filter: 326749

-> Parallel Seq Scan on emp_info_0 …,


Filter: (rating = 10)

Rows Removed by Filter: 980068

-> Parallel Seq Scan on emp_info_2 Filter: (rating = 10)

Rows Removed by Filter: 489780 Planning Time: 0.538 ms

Execution Time: 11080.809 ms                                     <<<<**** (23 rows)

-Join key

-Sequential scan for partitions in table emp

-Sequential scan for partitions in table emp_info

-Estimated execution time

Next, we enable partition-wise joins.

 

mydb=# SET enable_partitionwise_join TO on; SET

 


mydb=# SHOW enable_partitionwise_join; enable_partitionwise_join

on

(1 row)

Lastly, we check again the execution plan for the same SQL.

Note below that each partition pair is joined by a "Nested Loop" ,

and the results are summarised at the end.

mydb=# EXPLAIN ANALYZE

mydb-# SELECT emp.id, emp.name, emp_info.rating mydb-# FROM emp LEFT OUTER JOIN emp_info mydb-#               ON emp.id = emp_info.id A


mydb-# WHERE emp_info.rating = 10; QUERY PLAN

Gather

Workers Planned: 2

Workers Launched: 2

-> Parallel Append

-> Nested Loop

-> Parallel Seq Scan on emp_info_1 B Filter: (rating = 10)

Rows Removed by Filter: 980248

-> Index Scan using emp_1_pkey on emp 1 Index Cond: (id = emp_info_1.id)

-> Nested Loop

-> Parallel Seq Scan on emp_info_0 Filter: (rating = 10)

Rows Removed by Filter: 980068

-> Index Scan using emp_0_pkey on emp0 Index Cond: (id = emp_info_0.id)

-> Nested Loop

-> Parallel Seq Scan on emp_info_2 Filter: (rating = 10)

Rows Removed by Filter: 326520

-> Index Scan using emp_2_pkey on emp_2 Index Cond: (id = emp_info_2.id)


Planning Time: 1.290 ms

Execution Time: 1136.806 ms                               <<<<<***** (25 rows)

-  Join key

-  Sequential scan for partition in table emp_info

-  Index scan for partition in table emp

-  Estimated execution time

 

As we can see above, the execution time using partition-wise join is almost 10x faster (11080. 809 ms | 1136.806 ms).

 

The Nested Loop makes a big difference in speed because the data range to scan during the join processing of partitions is reduced.

The execution plan changes depending on whether partition-wise join is enabled.

With this feature, you can optimise JOIN by filtering the scan target in the most efficient way.

 

****Partition-wise aggregation****

In partition-wise aggregation, aggregation is done for each partition in a partition table and the results are integrated at the end.

Processing time can be shortened by performing aggregation for each partition. Similarly to partition-wise joins, note the following about partition-wise aggregations:

-They are disabled by default.

This is because a lot of CPU and memory may be consumed when creating execution plans. Verification is required whether they will be effective in your environment before using them.

-They can be enabled/disabled by setting enable_partitionwise_aggregate in postgresql.conf or by using SET.

-Their performance may be even amplified by using parallel query. Continue reading for an explanation of parallel query.

 

***Partition-wise aggregation example***

Again, let`s compare the estimated times in execution plans for the same SQL statement, where partition-wise aggregation is disabled against after it is enabled.

 

The sales table in the example uses range partitioning, with sale_date set as the partition key. The table is divided into three partitions.

--Table sales

 

mydb=# \d+ sales Table "public.sales" Column | Type |

-----------+---------+

id | integer | p_name | text | amount | integer | sale_date | date |

Partition key: RANGE (sale_date) Index:

"sales_id_idx" btree (id)

Partitions: sales_2019_Q4 FOR VALUES FROM ('2019-10-01') TO ('2020-01-01'),


sales_2020_Q1 FOR VALUES FROM ('2020-01-01') TO ('2020-04-01'), sales_2020_Q2 FOR VALUES FROM ('2020-04-01') TO ('2020-07-01')


mydb=# SELECT COUNT (*) FROM sales; count

3000000

(1 row)

***Comparing estimated execution times*** Check that partition-wise aggregation is disabled.


mydb=# SHOW enable partitionwise_aggregate; enable_partitionwise_aggregate

off

(1 row)

Check the execution plan of an SQL performing an aggregation on the partitioned table. mydb=# EXPLAIN ANALYZE

SELECT p_name, sum(amount) sales_total FROM sales WHERE p_name = 'prod_A' GROUP BY p_name;


QUERY PLAN

GroupAggregate

Group Key: sales_2019_Q4.p_name

-> Append

-> Bitmap Heap Scan on sales_2019_Q4 A … Recheck Cond: (p_name = 'prod_A'::text) Heap Blocks: exact=5483

-> Bitmap Index Scan on sales_2019_Q4_p_name_idx Index Cond: (p_name = 'prod_A'::text)

-> Bitmap Heap Scan on sales_2020_Q1 A … Recheck Cond: (p_name = 'prod_A'::text) Heap Blocks: exact=6716

-> Bitmap Index Scan on sales_2020_Q1_p_name_idx Index Cond: (p_name = 'prod_A'::text)

-> Bitmap Heap Scan on sales_2020_Q2 A … Recheck Cond: (p_name = 'prod_A'::text) Heap Blocks: exact=6120

-> Bitmap Index Scan on sales_2020_Q2_p_name_idx Index Cond: (p_name = 'prod_A'::text)

Planning Time: 0.345 ms

Execution Time: 503.067 ms                                <<<<<**** (20 rows)

-  Heap scan for partition

-  Estimated execution time

Next, we enable partition-wise aggregation.

 

mydb=# SET enable_partitionwise_aggregate TO on; SET

 


mydb= SHOW enable partitionwise aggregate; enable_partitionwise_aggregate

on

(1 row)


 

Then we check the execution plan`s estimated time again.

mydb=# EXPLAIN ANALYZE

SELECT p_name, sum(amount) sales_total FROM sales WHERE p_name = 'prod_A' GROUP BY p_name;


QUERY PLAN

Finalize GroupAggregate

Group Key: sales_2019_Q4.p_name

-> Append

-> Partial GroupAggregate

Group Key: sales_2019_Q4.p_name A

-> Bitmap Heap Scan on sales_2019_Q4 Recheck Cond: (p_name = 'prod_A'::text) Heap Blocks: exact=5483

-> Bitmap Index Scan on sales_2019_Q4_p_name_idx Index Cond: (p_name = 'prod_A'::text)

-> Partial GroupAggregate

Group Key: sales_2020_Q1.p_name A

-> Bitmap Heap Scan on sales_2020_Q1_p_name Recheck Cond: (p_name = 'prod_A'::text)

Heap Blocks: exact=6716

-> Bitmap Index Scan on sales_2020_Q1_p_name_idx Index Cond: (p_name = 'prod_A'::text)

-> Partial GroupAggregate

Group Key: sales_2020_Q2.p_name A

-> Bitmap Heap Scan on sales_2020_Q2 Recheck Cond: (p_name ='prod_A'::text) Heap Blocks: exact=6120

-> Bitmap Index Scan on sales_2020_Q2_p_name_idx Index Cond: (p_name = 'prod_A'::text)

Planning Time: 0.522 ms

Execution Time: 394.091 ms <<<<<***** (26 rows)

-Heap scan on partition

-Estimated execution time

 

As we can see above, the execution time using partition-wise aggregation is almost 30% faster (503.067ms | 394.091ms)

this is because the number of target rows is reduced by executing aggregation in each partition with fewer rows to scan, the overall speed is increased.

 

***Parallel query for partition table***

Parallel query is a feature that executes a single SQL in parallel using multiple processes. Performance is improved because the processing is distributed to multiple CPUs.

 

Parallel queries can be executed on partition tables, allowing performance benefits of partition- wise joins

and partition-wise aggregations to be enhanced even further.

 

This is due to the fact that processing is carried out in parallel for each partition, with the results then combined at the end.


In the example below where we query a partitioned table, we see that a sequential scan is performed on each partition in parallel.

 


mydb=# EXPLAIN SELECT COUNT(*) FROM sales; QUERY PLAN

Finalize Aggregate

-> Gather

Workers Planned: 2

-> Partial Aggregate

-> Parallel Append

-> Parallel Seq Scan on sales_2020_Q2

-> Parallel Seq Scan on sales_2020_Q1

-> Parallel Seq Scan on sales_2019_Q4

(8 rows)

Parallel queries in partitioned tables are enabled by default.

They can be enabled/disabled by setting enable_parallel_append in postgresql.conf.

 

No comments:

Post a Comment

In Oracle How to shrink Securefile LOB tables

As per Oracle support, if the table uses securefile LOB the traditional shrink method ALTER TABLE MODIFY LOB (XMLRECORD) (SHRINK SPACE) will...