EN RU

How to use PostgreSQL partitions with Django

Partitions in PostgreSQL(and MySQL too :)) are very useful when we have to store large data dividing them by attribute for future works. In my case I have logs of working on projects, which are stored in partition for each project.

Using partitions in django project is not very difficult, but there is one issue with “returning id” for insert statement in database. Function, which will insert a record to right table, must return trigger, but not id of last record as django expects.

Let's make a support of partitions step by step:

step 1: create models

1
2
3
4
5
6
7
class Project(models.Model):
    title = models.CharField(max_length=150)

class ProjectLog(models.Model):
    project = models.ForeignKey(Project)
    user = models.ForeignKey(User)
    content = models.CharField(max_length=500)

step 2: create migration to put partition logic to database. We will create two functions: one for creating separate table for each project and second for inserting record to right partition table

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def make_partitions_operants(apps, schema_editor):
    Project = apps.get_model('project', 'Project')
    ProjectLog = apps.get_model('project', 'ProjectLog')

    sql = '''
        create or replace function insert_into_projectlog() returns trigger as $$
        begin
          execute 'insert into %(projectlog_db_table)s_' || NEW.project_id || ' values(($1).*)' using NEW;

          return NULL;
        end $$ language 'plpgsql';

        create or replace function create_projectlog_partition() returns trigger as $$
        begin
          execute 'create table %(projectlog_db_table)s_' || NEW.id || ' (
            check (project_id = ' || NEW.id || ')
          ) inherits (%(projectlog_db_table)s);';

          return NEW;
        end $$ language 'plpgsql';

        create trigger trigger_create_projectlog_partition
          after insert on %(project_db_table)s
          for each row execute procedure create_projectlog_partition();

        create trigger trigger_insert_into_projectlog
          before insert on %(projectlog_db_table)s
          for each row execute procedure insert_into_projectlog();
    ''' % {
        'projectlog_db_table': ProjectLog._meta.db_table,
        'project_db_table': Project._meta.db_table,
        'rowtype': '%%ROWTYPE',
    }

    schema_editor.execute(sql)


class Migration(migrations.Migration):

    dependencies = [
        ('project', '0001_initial'),
    ]

    operations = [
        migrations.RunPython(make_partitions_operants),
    ]

I have used native Django 1.7 migrations

step 3: we need to overwrite save method of ProjectLog model

1
2
3
4
5
6
7
8
9
def save(self, *args, **kwargs):
        from django.db.backends.postgresql_psycopg2.base import DatabaseFeatures

        if not self.pk:
            DatabaseFeatures.can_return_id_from_insert = False

        super(ProjectLog, self).save(*args, **kwargs)

        DatabaseFeatures.can_return_id_from_insert = True

Here we overwrite can_return_id_from_insert to prevent using “returning id” statement in query to database.

And that’s all what we need.

Each time when we create a new project, a new partition for ProjectLog table will be created in database.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
test=# \dt
              List of relations
 Schema |        Name        | Type  | Owner
--------+--------------------+-------+-------
 public | project_project    | table | quard
 public | project_projectlog | table | quard
(2 rows)

test=# INSERT INTO project_project(title) VALUES('blog');
INSERT 0 1
test=# \dt
                   List of relations
 Schema |         Name         | Type  |     Owner
--------+----------------------+-------+----------------
 public | project_project      | table | quard
 public | project_projectlog   | table | quard
 public | project_projectlog_1 | table | vadymzakovinko
(3 rows)

test=# INSERT INTO project_project(title) VALUES('education');
INSERT 0 1
test=# \dt
                   List of relations
 Schema |         Name         | Type  |     Owner
--------+----------------------+-------+----------------
 public | project_project      | table | quard
 public | project_projectlog   | table | quard
 public | project_projectlog_1 | table | vadymzakovinko
 public | project_projectlog_2 | table | vadymzakovinko
(4 rows)

test=# INSERT INTO project_projectlog(user_id, project_id, content) VALUES(1, 1, 'new article');
INSERT 0 0
test=# SELECT * FROM project_projectlog;
 id | user_id | project_id |   content
----+---------+------------+-------------
  1 |       1 |          1 | new article
(1 row)

test=# SELECT * FROM project_projectlog_1;
 id | user_id | project_id |   content
----+---------+------------+-------------
  1 |       1 |          1 | new article
(1 row)

test=# SELECT * FROM project_projectlog_2;
 id | user_id | project_id | content
----+---------+------------+---------
(0 rows)

test=# INSERT INTO project_projectlog(user_id, project_id, content) VALUES(1, 2, 'django 1.7 migrations');
INSERT 0 0
test=# SELECT * FROM project_projectlog_2;
 id | user_id | project_id |        content
----+---------+------------+-----------------------
  2 |       1 |          2 | django 1.7 migrations
(1 row)

test=# SELECT * FROM project_projectlog_1;
 id | user_id | project_id |   content
----+---------+------------+-------------
  1 |       1 |          1 | new article
(1 row)

test=# SELECT * FROM project_projectlog;
 id | user_id | project_id |        content
----+---------+------------+-----------------------
  1 |       1 |          1 | new article
  2 |       1 |          2 | django 1.7 migrations
(2 rows)
    2014-10-04 10:00:30      django postgresql partition