These notes are for partitioning in MySQL for version 5.6.

Introduction

A nice thing about SQL is that it is declarative. You can tell SQL what you want and it will figure out how you get it. This makes life a lot easier. But at a certain threshold, you want to optimise how the underlying data is actually stored on your machine. That is where partitioning comes in. It allows you to dictate how the data is stored, based on something called a partition function.

Let’s say you have a lot of employee data in one gigantic table. Let’s say you want a list of all employees who joined in 2016. Right now you need to scan the entire table and find out this information. Wouldn’t it be nice if everyone who joined in 2016 be in one place, and all the 2015 guys in another? You could of course manually create all these tables, or you could let MySQL do it on its own in the background. That’s basically how partitioning works. You operate on one table and let MySQL figure out where that data goes behind the scenes.

Before I begin let me point out that Partitioning is not a silver bullet solution for performance issues. It is just another way of optimizing queries. In order to decide whether you need partitioning or not, and what kind of partitioning you need, you should first list down all the queries that will run against the table and go from there. Use explain to figure out your queries.

MySQL has different ways in which you could partition your data. But before we can do that, you need to check if your database installation supports partitioning.

Check for Support

Run any of the following queries and check if partitioning is active.

One more thing. There are two kinds of partitioning. The first is horizontal partitioning, where different rows are assigned to different physical partitions. This is what we’re talking about in this post. The second is vertical partitioning, in which different columns are assigned to different physical partitions. MySQL does not support vertical partitioning as of now.

Types of Partitioning

MySQL supports several types of partitioning:

Range Partitioning

When you partition a table by using range partitioning, the values are placed in a particular partition as long as the value falls within the given range for that partition. Ranges should be contiguous but not overlapping, and are defined using the VALUES LESS THAN operator. For example:

CREATE TABLE range_test (
     `id` INTEGER NOT NULL,
     `some_date` DATE NOT NULL
)
PARTITION BY RANGE(id) (
     PARTITION p0 VALUES LESS THAN (5),
     PARTITION p1 VALUES LESS THAN (10),
     PARTITION p2 VALUES LESS THAN (15),
     PARTITION p3 VALUES LESS THAN (20)
);

Each partition must be defined in order, from lowest to highest. The value you specify is not included in the range, meaning if we insert the value 5 in the above table, it will go in the second partition.

Let’s insert the values from 1 to 20 in this table. The table now looks like this:

+----+------------+
| id | some_date  |
+----+------------+
|  1 | 2016-05-20 |
|  2 | 2016-05-20 |
|  3 | 2016-05-20 |
|  4 | 2016-05-20 |
|  5 | 2016-05-20 |
|  6 | 2016-05-20 |
|  7 | 2016-05-20 |
|  8 | 2016-05-20 |
|  9 | 2016-05-20 |
| 10 | 2016-05-20 |
| 11 | 2016-05-20 |
| 12 | 2016-05-20 |
| 13 | 2016-05-20 |
| 14 | 2016-05-20 |
| 15 | 2016-05-20 |
| 16 | 2016-05-20 |
| 17 | 2016-05-20 |
| 18 | 2016-05-20 |
| 19 | 2016-05-20 |
+----+------------+

Note that when you try to insert the value 20, it will give you the following error: ERROR 1526 (HY000): Table has no partition for value 20. That’s because the cut off values defined in the partition expression will go into the next partition, meaning the greatest value that can go in this table is 19, and there is no partition defined for 20.

To see which partitions are being used, execute the following query: SELECT table_schema, table_name, partition_name, partition_method, partition_expression, table_rows FROM information_schema.partitions WHERE table_schema = "partition_test"; You should see something like this:

+----------------+------------+----------------+------------------+----------------------+------------+
| table_schema   | table_name | partition_name | partition_method | partition_expression | table_rows |
+----------------+------------+----------------+------------------+----------------------+------------+
| partition_test | range_test | p0             | RANGE            | id                   |          4 |
| partition_test | range_test | p1             | RANGE            | id                   |          5 |
| partition_test | range_test | p2             | RANGE            | id                   |          5 |
| partition_test | range_test | p3             | RANGE            | id                   |          5 |
+----------------+------------+----------------+------------------+----------------------+------------+

Here you can see the information for your partitioned table.

If you want a partition that will contain all values defined after 19, then you can defined a partition like so: PARTITION p3 VALUES LESS THAN MAXVALUE

We can also define partitions based on the DATE column. It might be something like this:

CREATE TABLE range_test_2 (
     `id` INTEGER NOT NULL,
     `some_date` DATE NOT NULL
)
PARTITION BY RANGE(YEAR(some_date)) (
     PARTITION p0 VALUES LESS THAN (2000),
     PARTITION p1 VALUES LESS THAN (2005),
     PARTITION p2 VALUES LESS THAN (2010),
     PARTITION p3 VALUES LESS THAN (2015),
     PARTITION p4 VALUES LESS THAN MAXVALUE
);

You can use a lot of date functions to define partitions, like: UNIX_TIMESTAMP(), TO_DAYS(), TO_SECONDS(), WEEKDAY(), DAYOFYEAR(), MONTH(), etc. You can use pretty much any date function that returns an integer or null.

There are certain advantages of using this kind of partitioning:

If the value of the partitioning expression evaluates to NULL, the row will be inserted in the lowest partition.

List Partitioning

Partitioning a table using list partitioning means saving data into partitions based on a predefined list of values. For example:

CREATE TABLE list_test (
     `id` INTEGER NOT NULL,
     `some_date` DATE NOT NULL
)
PARTITION BY LIST(id) (
     PARTITION p0 VALUES IN (NULL, 1, 2, 3, 4, 5),
     PARTITION p1 VALUES IN (6, 7, 8, 9, 10)
);

Unlike range, there is no MAXVALUE catch-all for list partitions. All partitions must be explicitly defined. Also, if you want to keep data containing NULL in any partition, it should be explicitly defined in the list of values.

Range Columns Partitioning

This type of partitioning is a variation of Range partitioning, which has certain advantages over normal Range partitioning. It allows partitioning on multiple columns and non-integer columns. It does not, however, support expressions, and only allows list of columns. When using multiple columns, the range of all columns is checked. You can also specify limiting conditions in multiple definitions, as long as they are strictly increasing. Like so:

CREATE TABLE range_columns_string_test (
     `id` INTEGER NOT NULL,
     `first_name` VARCHAR(30),
     `last_name` VARCHAR(30)

)
PARTITION BY RANGE COLUMNS(first_name, last_name) (
     PARTITION p0 VALUES LESS THAN ('a', 'h'),
     PARTITION p1 VALUES LESS THAN ('h', 'p'),
     PARTITION p2 VALUES LESS THAN ('p', MAXVALUE)

);

Different character sets and collations have different sorting orders, which will affect how rows are distributed in different partitions.

We can also use DATE field with range column partitioning:

CREATE TABLE range_columns_date_test (
     `id` INTEGER NOT NULL,
     `some_date` DATE NOT NULL
)
PARTITION BY RANGE COLUMNS(some_date) (
     PARTITION p0 VALUES LESS THAN ('2000-01-01'),
     PARTITION p1 VALUES LESS THAN ('2005-01-01'),
     PARTITION p2 VALUES LESS THAN ('2010-01-01')
);

List Columns Partitioning

This is an extension of the list partitioning scheme, which allows you to use non-integer columns in partition function, and allows you to use multiple columns in the partition definition. For example:

CREATE TABLE list_columns_test (
     `id` INTEGER NOT NULL,
     `some_text` VARCHAR(30)
)
PARTITION BY LIST COLUMNS (some_text) (
     PARTITION p0 VALUES IN ('a', 'b', 'c', 'd'),
     PARTITION p1 VALUES IN ('e', 'f', 'g', 'h')
);

Hash Partitioning

With Range or List partitioning, the partitions need to be explicitly defined. With Hash partitioning, you can specify an integer partition function and the number of partitions to be created, and MySQL will take care of the rest. The partition function can be an expression that returns an integer, or be the name of the column which is an integer type. The default number of partitions is 1. For example:

CREATE TABLE hash_test (
     `id` INTEGER NOT NULL,
     `some_text` VARCHAR(30),
     `some_id` INTEGER NOT NULL
)
PARTITION BY HASH(some_id)
PARTITIONS 10;

Another example could be:

CREATE TABLE hash_test_employees (
     `id` INTEGER NOT NULL,
     `first_name` VARCHAR(30),
     `last_name` VARCHAR(30),
     `joining_date` DATE NOT NULL
)
PARTITION BY HASH(YEAR(joining_date))
PARTITIONS 5;

Good partition functions are those whose values vary proportionately with every change in the value of the partition column, i.e., the graph of the column value and the column expression must be a straight line traced by y = cx. To determine which partition the row goes into, it evaluates the following expression:

Partition number = MOD(expression, number of partitions)

Linear Hash Partitioning

This is a variation of the Hash partitioning. The difference is, that the Hash partitioning employs a modulus function to determine the partition number, but Linear Hash employs a linear power of 2 algorithm. The algorithm is:

The disadvantage is that the data is less likely to be evenly distributed as compared to Hash partitioning.

Key Partitioning

Key partitioning is like Hash partitioning, the difference being that in this partitioning scheme, the partition function is supplied by MySQL, which is either MD5 or PASSWORD. Non-integer columns can also be used in the partition expression. If no column is specified, the primary key column is used.

CREATE TABLE key_test (
     `id` INTEGER NOT NULL,
     `some_text` VARCHAR(30) NOT NULL
)
PARTITION BY KEY(some_text)
PARTITIONS 5;

The DATE, TIME, and DATETIME columns can also be used in the partitioning expression.

Linear Key Partitioning

Much like Linear Hash partitioning, Linear Key partitioning relies on a power of 2 function instead of a modulus function to distribute data among partitions.

If you’ve experimented with a few partition types and have file-per-table enabled, head on over to your MySQL data directory and see if you can spot your partition files.

Things to remember

Partitioning Information

To view partitioning information:

Sub partitioning/Composite Partitioning

Sub partitioning allows you to further divide a given table into more partitions.

CREATE TABLE subpartition_test (
     `id` INTEGER NOT NULL,
     `joining_date` DATE,
     `employee_id` INTEGER
)
PARTITION BY RANGE(YEAR(joining_date))
SUBPARTITION BY HASH (employee_id) (
     PARTITION p0 VALUES LESS THAN (1990) (
          SUBPARTITION s0,
          SUBPARTITION s1
     ),
     PARTITION p1 VALUES LESS THAN (2000) (
          SUBPARTITION s2,
          SUBPARTITION s3
     )
);

Now let’s add some data to the above table to it looks like this:

+----+--------------+-------------+
| id | joining_date | employee_id |
+----+--------------+-------------+
|  1 | 1980-01-01   |           1 |
|  1 | 1981-01-01   |           2 |
|  1 | 1982-01-01   |           3 |
|  1 | 1983-01-01   |           4 |
|  1 | 1984-01-01   |           5 |
|  1 | 1985-01-01   |           6 |
|  1 | 1986-01-01   |           7 |
|  1 | 1987-01-01   |           8 |
|  1 | 1988-01-01   |           9 |
|  1 | 1989-01-01   |          10 |
|  1 | 1990-01-01   |          11 |
|  1 | 1991-01-01   |          12 |
|  1 | 1992-01-01   |          13 |
|  1 | 1993-01-01   |          14 |
|  1 | 1994-01-01   |          15 |
|  1 | 1995-01-01   |          16 |
|  1 | 1996-01-01   |          17 |
|  1 | 1997-01-01   |          18 |
|  1 | 1998-01-01   |          19 |
|  1 | 1999-01-01   |          20 |
+----+--------------+-------------+

Now let’s see where all that data ended up:

SELECT table_schema, table_name, partition_name, partition_method, partition_expression, subpartition_name, subpartition_method, subpartition_expression, table_rows FROM information_schema.partitions WHERE table_schema = "partition_test" AND table_name = "subpartition_test";

+----------------+-------------------+----------------+------------------+----------------------+-------------------+---------------------+-------------------------+------------+
| table_schema   | table_name        | partition_name | partition_method | partition_expression | subpartition_name | subpartition_method | subpartition_expression | table_rows |
+----------------+-------------------+----------------+------------------+----------------------+-------------------+---------------------+-------------------------+------------+
| partition_test | subpartition_test | p0             | RANGE            | YEAR(joining_date)   | s0                | HASH                | employee_id             |          5 |
| partition_test | subpartition_test | p0             | RANGE            | YEAR(joining_date)   | s1                | HASH                | employee_id             |          5 |
| partition_test | subpartition_test | p1             | RANGE            | YEAR(joining_date)   | s2                | HASH                | employee_id             |          5 |
| partition_test | subpartition_test | p1             | RANGE            | YEAR(joining_date)   | s3                | HASH                | employee_id             |          5 |
+----------------+-------------------+----------------+------------------+----------------------+-------------------+---------------------+-------------------------+------------+

Some things to note:

It is also possible to define data and index directories for individual partitions.

CREATE TABLE subpartition_directories_test (
     `id` INTEGER NOT NULL,
     `joining_date` DATE,
     `employee_id` INTEGER
)
PARTITION BY RANGE(YEAR(joining_date))
SUBPARTITION BY HASH (employee_id) (
     PARTITION p0 VALUES LESS THAN (1990) (
          SUBPARTITION s0
               DATA DIRECTORY = '/tmp/disk1/data'
               INDEX DIRECTORY = '/tmp/disk1/index',
          SUBPARTITION s1
               DATA DIRECTORY = '/tmp/disk2/data'
               INDEX DIRECTORY = '/tmp/disk2/index'

     ),
     PARTITION p1 VALUES LESS THAN (2000) (
          SUBPARTITION s2
               DATA DIRECTORY = '/tmp/disk3/data'
               INDEX DIRECTORY = '/tmp/disk3/index',
          SUBPARTITION s3
               DATA DIRECTORY = '/tmp/disk4/data'
               INDEX DIRECTORY = '/tmp/disk4/index'
     )
);

Managing Partitions

You can use several commands to manage partitions:

Maintenance of Partitions

You can use several commands to maintain partitions:

Partition Pruning

Partition pruning is what partitioning is really about. It involves getting the MySQL query optimiser to only search those partitions where it is going to find data. For example, if you’ve created partitions by year, then the trick is to get MySQL to search only the partition for the year that you need. You can find out what partition a query will be using by sticking EXPLAIN PARTITIONS in front of your query. Pruning can be applied to SELECTs, UPDATEs, and DELETEs.

Partition Selection

Partition selection is another aspect of partitioning. It’s like pruning, expect here you get to specify which partitions to search in the query itself. For example:

SELECT * FROM table_name PARTITION (p0);

You can use partition selection with SELECTs, INSERTs, DELETEs, UPDATEs, and a few more.

Resources