enough water makes all rocks round

Automatic Pickle Serialization and Deserialization with PostgreSQL

Building on the earlier success of creating a pickle data type in PostgreSQL, I’ve come up with this small and comfortable way of automatically serializing and de-serializing python pickles with PostgreSQL and psycopg2.

First, a quick recap of the PG data-type-creation-by-cloning method. As the database super user, issue a command like this to create a copy of the text type named pickle:

INSERT INTO pg_catalog.pg_type 
    SELECT 'pickle' AS typname, 
           typnamespace, 
           typowner, 
           typlen, 
           typbyval, 
           typtype, 
           typisdefined, 
           typdelim, 
           typrelid, 
           typelem, 
           typinput, 
           typoutput, 
           typreceive, 
           typsend, 
           typanalyze, 
           typalign, 
           typstorage, 
           typnotnull, 
           typbasetype, 
           typtypmod, 
           typndims, 
           typdefaultbin, 
           typdefault 
    FROM pg_catalog.pg_type 
    WHERE typname = 'text';

We have to create a new type so that we have something unique and specific to register with the extension. Now create a test table using the new type like this:

CREATE TABLE employee (
    employee SERIAL PRIMARY KEY,
    fullname TEXT,
    photo PICKLE
);

Both of these commands are in the attachment pickletest.sql.

We also need a python module and class for pickling. Put a file named emp.py somewhere in your python search path, and fill it like so:

#!/usr/bin/env python
from pgpickle import conform_method, quoted_pickle_method

class EmployeePhoto:
    def __init__(self, content_type, content):
        self.content_type = content_type
        self.content = content

    __conform__ = conform_method()
    getquoted = quoted_pickle_method()

That’s it! Two simple statements and EmployeePhoto instances are ready for automatic pickling on INSERT and UPDATE.

A couple of notes about this EmployeePhoto class are in order. First, the __conform__ and getquoted methods are used by psycopg2 to adapt instances as they’re inserted into the database. Second, I’ve coded them here as closures in the pgpickle module (see below) because I think the approach is simpler than messing with class inheritance — having them as separate constructs allows for mix and match as appropriate, and as the GoF admonishes “favor object composition over inheritance.” There’s nothing stopping you from restating these two methods as a mix-in class, but I’ll take my closures where I can get them.

Also in your python path, create the pgpickle.py module, and make its contents match this:

#!/usr/bin/env python
from cPickle import dumps, loads
from base64 import b64encode, b64decode
import psycopg2
import psycopg2.extensions as extensions
import psycopg2._psycopg as _psycopg2


def conform_method(match=extensions.ISQLQuote):
    def inner(self, proto):
        if proto == match:
            return self
    return inner


def quoted_pickle_method(protocol=-1):
    def inner(self):
        return "'%s'" % b64encode(dumps(self, protocol))
    return inner


def load_pickle(value, cursor):
    if value is not None:
        return loads(b64decode(value))


def register_data_type(connection, pg_type_name, type_name, loader):
    sql = "SELECT %s::regtype::oid"
    cursor = connection.cursor()
    cursor.execute(sql, (pg_type_name, ))
    type_oids = cursor.fetchone()
    type_oid = type_oids[0]

    if type_oid not in extensions.string_types:
        new_type = extensions.new_type(type_oids, type_name, loader)
        extensions.register_type(new_type)
    return type_oid

Last but not least, we need a third python script to bring this all together. This would normally be your application script, but we’re just testing here, so let’s call this one addemp.py. Start it off like this:

#!/usr/bin/env python
import urllib2
import psycopg2

import emp
import pgpickle
Then add a main script check and a few lines to fetch an image from the net and create an EmployeePhoto instance:
if __name__ == '__main__':
    img_url = 'http://static.flickr.com/31/65132743_68b35b799d_m.jpg'
    img = urllib2.urlopen(img_url).read()
    cartman_photo = emp.EmployeePhoto('image/jpeg', img)

Then create a connection, register the pickle type with it, and create a cursor. Adjust the connect call to match your database hostname, username, database name and password:

    con = psycopg2.connect('dbname=yourdb host=localhost user=yourname password=none')
    pgpickle.register_data_type(con, 'pickle', 'pickle', pgpickle.load_pickle)
    cur = con.cursor()

Here’s where the work starts to pay off. We run a normal insert statement, passing the EmployeePhoto instance as a parameter. No extra fiddling:

    sql = "INSERT INTO employee (fullname, photo) VALUES (%s, %s)"
    cur.execute(sql, ('Eric Cartman', cartman_photo))
    print cur.statusmessage

And finally, let’s pull out the new record and compare it and the instance we get against what we put in:

    sql = "SELECT employee, fullname, photo FROM employee WHERE fullname = %s"
    cur.execute(sql, ('Eric Cartman', ))
    cartman = cur.fetchone()
    photo = cartman[2]

    print cartman
    assert photo.content_type == cartman_photo.content_type
    assert photo.content == cartman_photo.content

Here’s how I ran this file and what it produced:

# python addemp.py
INSERT 37309 1
(23, 'Eric Cartman', )

Notice that it took me only 22 attempts to get this recipe correct! Take that, you statically-typed Enterprisy fiends! :)

You should be able to use this technique as-is in your own applications, and of course you’re welcome to adapt it to fit your needs. Either way, I hope you enjoy!

Files:
* [pickletype.sql](/files/pickletype.sql)
* [emp.py](/files/emp.py)
* [pgpickle.py](/files/pgpickle.py)
* [addemp.py](/files/addemp.py)

Advertisements

Comments are closed.