Automatic Pickle Serialization and Deserialization with PostgreSQL
Building on the earlier success of creating a pickle data type in PostgreSQL, I’ve come up with this small and comfortable way of automatically serializing and de-serializing python pickles with PostgreSQL and psycopg2.
First, a quick recap of the PG data-type-creation-by-cloning method. As the database super user, issue a command like this to create a copy of the text type named pickle:
INSERT INTO pg_catalog.pg_type SELECT 'pickle' AS typname, typnamespace, typowner, typlen, typbyval, typtype, typisdefined, typdelim, typrelid, typelem, typinput, typoutput, typreceive, typsend, typanalyze, typalign, typstorage, typnotnull, typbasetype, typtypmod, typndims, typdefaultbin, typdefault FROM pg_catalog.pg_type WHERE typname = 'text';
We have to create a new type so that we have something unique and specific to register with the extension. Now create a test table using the new type like this:
CREATE TABLE employee ( employee SERIAL PRIMARY KEY, fullname TEXT, photo PICKLE );
Both of these commands are in the attachment pickletest.sql.
We also need a python module and class for pickling. Put a file named emp.py somewhere in your python search path, and fill it like so:
#!/usr/bin/env python from pgpickle import conform_method, quoted_pickle_method class EmployeePhoto: def __init__(self, content_type, content): self.content_type = content_type self.content = content __conform__ = conform_method() getquoted = quoted_pickle_method()
That’s it! Two simple statements and EmployeePhoto instances are ready for automatic pickling on INSERT and UPDATE.
A couple of notes about this EmployeePhoto class are in order. First, the __conform__ and getquoted methods are used by psycopg2 to adapt instances as they’re inserted into the database. Second, I’ve coded them here as closures in the pgpickle module (see below) because I think the approach is simpler than messing with class inheritance — having them as separate constructs allows for mix and match as appropriate, and as the GoF admonishes “favor object composition over inheritance.” There’s nothing stopping you from restating these two methods as a mix-in class, but I’ll take my closures where I can get them.
Also in your python path, create the pgpickle.py module, and make its contents match this:
#!/usr/bin/env python from cPickle import dumps, loads from base64 import b64encode, b64decode import psycopg2 import psycopg2.extensions as extensions import psycopg2._psycopg as _psycopg2 def conform_method(match=extensions.ISQLQuote): def inner(self, proto): if proto == match: return self return inner def quoted_pickle_method(protocol=-1): def inner(self): return "'%s'" % b64encode(dumps(self, protocol)) return inner def load_pickle(value, cursor): if value is not None: return loads(b64decode(value)) def register_data_type(connection, pg_type_name, type_name, loader): sql = "SELECT %s::regtype::oid" cursor = connection.cursor() cursor.execute(sql, (pg_type_name, )) type_oids = cursor.fetchone() type_oid = type_oids[0] if type_oid not in extensions.string_types: new_type = extensions.new_type(type_oids, type_name, loader) extensions.register_type(new_type) return type_oid
Last but not least, we need a third python script to bring this all together. This would normally be your application script, but we’re just testing here, so let’s call this one addemp.py. Start it off like this:
#!/usr/bin/env python import urllib2 import psycopg2 import emp import pgpickle Then add a main script check and a few lines to fetch an image from the net and create an EmployeePhoto instance: if __name__ == '__main__': img_url = 'http://static.flickr.com/31/65132743_68b35b799d_m.jpg' img = urllib2.urlopen(img_url).read() cartman_photo = emp.EmployeePhoto('image/jpeg', img)
Then create a connection, register the pickle type with it, and create a cursor. Adjust the connect call to match your database hostname, username, database name and password:
con = psycopg2.connect('dbname=yourdb host=localhost user=yourname password=none') pgpickle.register_data_type(con, 'pickle', 'pickle', pgpickle.load_pickle) cur = con.cursor()
Here’s where the work starts to pay off. We run a normal insert statement, passing the EmployeePhoto instance as a parameter. No extra fiddling:
sql = "INSERT INTO employee (fullname, photo) VALUES (%s, %s)" cur.execute(sql, ('Eric Cartman', cartman_photo)) print cur.statusmessage
And finally, let’s pull out the new record and compare it and the instance we get against what we put in:
sql = "SELECT employee, fullname, photo FROM employee WHERE fullname = %s" cur.execute(sql, ('Eric Cartman', )) cartman = cur.fetchone() photo = cartman[2] print cartman assert photo.content_type == cartman_photo.content_type assert photo.content == cartman_photo.content
Here’s how I ran this file and what it produced:
# python addemp.py INSERT 37309 1 (23, 'Eric Cartman', )
Notice that it took me only 22 attempts to get this recipe correct! Take that, you statically-typed Enterprisy fiends! :)
You should be able to use this technique as-is in your own applications, and of course you’re welcome to adapt it to fit your needs. Either way, I hope you enjoy!
Files:
* [pickletype.sql](/files/pickletype.sql)
* [emp.py](/files/emp.py)
* [pgpickle.py](/files/pgpickle.py)
* [addemp.py](/files/addemp.py)