A Tale of PostgreSQL Types and Python
Thinking of integrating user-defined types in PostgreSQL and psycopg2? Here’s the path I walked, hopefully you can avoid the same pitfalls.
I was after a PostgreSQL data type for storing Python pickles. This seemed simple enough because the database facility for aliasing a type already exists. So the first thing I tried was a CREATE DOMAIN statement in PG, like so:
CREATE DOMAIN pickle AS bytea;
This worked within PG but not within the psycopg2 extension. It seems that psycopg2 reports the underlying data type, not the domain. I’ve since opened a ticket in their Trac. I hope this will get fixed in psycopg2, because this is still the most reasonable technique.
The next thing I tried was the CREATE TYPE statement (using the second form to create a new base type). This was a pretty obvious thing to do, so I thought, as the input/output/send/receive functions are already defined. Not obvious enough! Seems there’s no way to reference the BYTEA functions because they’re in the PG executable, not within a shared library.
Undeterred, I turned to pyscopg2 for it’s ability to map composite types into python callables. Here’s the composite type definition in PG, using the first form of CREATE TYPE:
CREATE TYPE pickle AS data BYTEA;
After defining and registering a Python callable to handle the data, everything worked. Here’s an example of the Python code:
from cPickle import dumps, loads from base64 import b64encode, b64decode import psycopg2, psycopg2.extensions class PgPickle: def __init__(self, payload, protocol=-1): self.payload = payload self.protocol = protocol def __conform__(self, proto): if proto == psycopg2.extensions.ISQLQuote: return self def getquoted(self): return "ROW('%s')" % b64encode(dumps(self.payload, self.protocol)) def load(value, cursor): if value is not None: return loads(b64decode(value[1:-1])) load = staticmethod(load) def register_pickle_type(cursor): sql = """ SELECT 'pickle'::regtype::oid """ cursor.execute(sql) type_oids = cursor.fetchone() pg_pickle = psycopg2.new_type(type_oids, "pickle", PgPickle.load) psycopg2.register_type(pg_pickle) return type_oids[0]
This didn’t work with the BYTEA type, so I changed it to TEXT instead of dealing with the decoding (call me lazy). While that was a minor problem, the larger problem was that the solution wasn’t satisfying or natural.
Also, please note that the above code isn’t tested; it’s merely a monkey-c+monkey-v from a test script I had lying around. YMMV.
Still undeterred, I decided to bite the bullet, as they say, and write a custom PG type in C. “Can’t be that hard,” I told myself. And it wasn’t. After rooting through the PG source and the contributed extensions, I came up with this module, pgpickle.c:
#include "postgres.h" #include "utils/builtins.h" #include "lib/stringinfo.h" typedef bytea pickle; PG_FUNCTION_INFO_V1(pickle_in); PG_FUNCTION_INFO_V1(pickle_out); PG_FUNCTION_INFO_V1(pickle_receive); PG_FUNCTION_INFO_V1(pickle_send); Datum pickle_in(PG_FUNCTION_ARGS) { // takes string, returns pickle char *str = PG_GETARG_CSTRING(0); pickle *result = DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str))); PG_RETURN_BYTEA_P(result); } Datum pickle_out(PG_FUNCTION_ARGS) { // takes pickle, returns cstring char *result; result = DatumGetCString(DirectFunctionCall1(byteaout, PG_GETARG_DATUM(0))); PG_RETURN_CSTRING(result); }
And it’s associated SQL script, pgpickle.sql:
DROP TYPE pickle CASCADE; DROP FUNCTION pickle_in(cstring); DROP FUNCTION pickle_out(pickle); -- force a reload if needed LOAD 'pgpickle.so'; CREATE FUNCTION pickle_in(cstring) RETURNS pickle AS 'pgpickle.so' LANGUAGE C IMMUTABLE STRICT; CREATE FUNCTION pickle_out(pickle) RETURNS cstring AS 'pgpickle.so' LANGUAGE C IMMUTABLE STRICT; CREATE TYPE pickle ( INTERNALLENGTH = 4, EXTERNALLENGTH = variable, INPUT = pickle_in, OUTPUT = pickle_out ); COMMENT ON FUNCTION pickle_in(cstring) IS 'function to convert string to pickle'; COMMENT ON FUNCTION pickle_out(pickle) IS 'function to convert pickle to string'; COMMENT ON TYPE pickle IS 'nifty bytea synonym';
Now, I haven’t programmed C seriously since school, and while the above code seemed to work, I didn’t want to commit to testing it and maintaining it.
So I went back to the Python code. For a day. I can’t ever leave something be (OCD? In my house, we call it Orange County Department ’cause the kids are down with it, too, and we don’t want to give them a complex. Not yet.)
The last approach I tried, the one I’m still using, is a simple and powerful INSERT statement:
INSERT INTO pg_catalog.pg_type ( typname, typnamespace, typowner, typlen, typbyval, typtype, typisdefined, typdelim, typrelid, typelem, typinput, typoutput, typreceive, typsend, typanalyze, typalign, typstorage, typnotnull, typbasetype, typtypmod, typndims, typdefaultbin, typdefault ) VALUES ( 'pickle', 11, 1, -1, 'f', 'b', 't', ',', 0, 0, 'byteain', 'byteaout', 'bytearecv', 'byteasend', 0, 'i', 'x', 'f', 0, -1, 0, '', '' );
There’s probably a better way to formulate the statement, but I was so elated at it’s correct operation that I backed away from the keyboard slowly. And did a little jig of joy.
This last approach creates the type correctly in PG, and with a little magic in Python, maps the type correctly to client code. Here’s the final Python version:
import psycopg2._psycopg as dbapiext def register_clone_type(connection, source_name, target_name): sql = """ SELECT %s::regtype::oid """ cursor = connection.cursor() string_types = dbapiext.string_types cursor.execute(sql, (source_name, )) source_oid = cursor.fetchone()[0] cursor.execute(sql, (target_name, )) target_oid = cursor.fetchone()[0] if target_oid not in string_types: string_types[target_oid] = string_types[source_oid] return target_oid register_clone_type(myconnection, 'bytea', 'pickle')