- IBM DB2 9.7 Advanced Administration Cookbook
- Adrian Neagu Robert Pelletier
- 596字
- 2021-08-20 15:33:23
The table distribution key and its role in a multipartitioned environment
The table distribution key is another element that can influence the distribution of table rows across existent partitions defined in a database partition group. Distribution keys can be created explicitly using DISTRIBUTION BY HASH
(columns) directive inside table definition or they can be defined implicitly by DB2 using a series of column rules detailed in this recipe. Usually distribution column might be designed and used to improve query performance.
Getting ready
This recipe is strongly correlated with the preceding one; here, we will also use before and after pictures of how the data is distributed across database partitions.
How to do it...
For some of the queries, we will use Command Editor, because it offers better visibility.
- To find the distribution key columns for the
COMM
table, issue the following command:[db2instp@nodedb21 ~]$ db2 "select name from sysibm.syscolumns where tbname='COMM' and partkeyseq !=0" NAME CKEY 1 record(s) selected. [db2instp@nodedb21 ~]$
Since we did not specify any distribution key at table creation, the primary key is used as the distribution key.
- To find the distribution of rows on partitions, by row numbers, before performing redistribution of table
COMM
, issue the following command (for better visibility run this statement in Command Center | Tools | Command Editor):select dbpartitionnum(CKEY) as PARTITION_NUMBER, count(CKEY) as NUMBER_OF_ROWS from comm group by dbpartitionnum(CKEY) order by dbpartitionnum(CKEY)
- After redistribution, issue the last statement again. Now we can see that rows are distributed slightly uniformly across partitions:

How it works...
Table rows partiticpating in distributed keys are assigned to partitions using a hashing algorithm that calculates their distribution across partitions. If you do not specify a distribution key at table creation, it will be created implicitly, as follows:
- If the table has a primary key, it will be used as a distribution key
- If the table does not have a primary key, the first column, whose data type is not
LOB, LONG VARCHAR, LONG VARGRAPHIC
, orXML
, will be used as a partition key - If the table columns are not in this category, it means that a partition key cannot be defined, and the table can only be created in a table space that has been created on a single-partition database partition group
Distribution keys should be created to improve query performance in the following conditions:
- Columns are frequently used in joins
- Columns are frequently used in group by clause
- Columns defined as primary keys or unique must be included in distribution keys
- Create distribution keys for column used mostly in equality conditions
Practically the key columns are hashed and divided in 4095 or 32768 buckets depending on the distribution map used. Every map entry indicates where a specific row on which partition might be found.
There's more...
You cannot change the distribution key by altering the table, unless the table is spread across only one partition. To change the partition key for tables spanned on multiple partitions you should follow these steps:
- Export or copy table data in a staging table.
- Drop and recreate the table, with the new distribution key defined.
- Import or insert table data.
Distribution keys play a major role also in a method of table distribution called table collocation. Table collocation is another performance tuning technique found in multipartitioned database environments. A table is considered collocated when the following conditions are met:
- Tables are placed in the same table space and are in the same partition groups.
- The distribution keys have the same columns and same data types and are partition compatible.
- 自然語言處理實戰:預訓練模型應用及其產品化
- Learning Apex Programming
- 零基礎學MQL:基于EA的自動化交易編程
- Access 2010數據庫應用技術(第2版)
- HTML5+CSS3 Web前端開發技術(第2版)
- JavaScript應用開發實踐指南
- 大學計算機基礎實驗指導
- Mastering Concurrency Programming with Java 9(Second Edition)
- 大數據時代的企業升級之道(全3冊)
- Mastering Apache Camel
- LabVIEW數據采集
- Continuous Delivery and DevOps:A Quickstart Guide Second Edition
- Developer,Advocate!
- Getting Started with Hazelcast
- Storm Real-Time Processing Cookbook