enough water makes all rocks round

A Tale of PostgreSQL Types and Python

Thinking of integrating user-defined types in PostgreSQL and psycopg2? Here’s the path I walked, hopefully you can avoid the same pitfalls.

I was after a PostgreSQL data type for storing Python pickles. This seemed simple enough because the database facility for aliasing a type already exists. So the first thing I tried was a CREATE DOMAIN statement in PG, like so:

CREATE DOMAIN pickle AS bytea;

This worked within PG but not within the psycopg2 extension. It seems that psycopg2 reports the underlying data type, not the domain. I’ve since opened a ticket in their Trac. I hope this will get fixed in psycopg2, because this is still the most reasonable technique.

The next thing I tried was the CREATE TYPE statement (using the second form to create a new base type). This was a pretty obvious thing to do, so I thought, as the input/output/send/receive functions are already defined. Not obvious enough! Seems there’s no way to reference the BYTEA functions because they’re in the PG executable, not within a shared library.

Undeterred, I turned to pyscopg2 for it’s ability to map composite types into python callables. Here’s the composite type definition in PG, using the first form of CREATE TYPE:

CREATE TYPE pickle AS data BYTEA;

After defining and registering a Python callable to handle the data, everything worked. Here’s an example of the Python code:

from cPickle import dumps, loads
from base64 import b64encode, b64decode
import psycopg2, psycopg2.extensions

class PgPickle:
    def __init__(self, payload, protocol=-1):
        self.payload = payload
        self.protocol = protocol

    def __conform__(self, proto):
        if proto == psycopg2.extensions.ISQLQuote:
            return self

    def getquoted(self):
        return "ROW('%s')" % b64encode(dumps(self.payload, self.protocol))

    def load(value, cursor):
        if value is not None:
            return loads(b64decode(value[1:-1]))
    load = staticmethod(load)


def register_pickle_type(cursor):
    sql = """
    SELECT 'pickle'::regtype::oid
    """
    cursor.execute(sql)
    type_oids = cursor.fetchone()
    pg_pickle = psycopg2.new_type(type_oids, "pickle", PgPickle.load)
    psycopg2.register_type(pg_pickle)
    return type_oids[0]

This didn’t work with the BYTEA type, so I changed it to TEXT instead of dealing with the decoding (call me lazy). While that was a minor problem, the larger problem was that the solution wasn’t satisfying or natural.
Also, please note that the above code isn’t tested; it’s merely a monkey-c+monkey-v from a test script I had lying around. YMMV.

Still undeterred, I decided to bite the bullet, as they say, and write a custom PG type in C. “Can’t be that hard,” I told myself. And it wasn’t. After rooting through the PG source and the contributed extensions, I came up with this module, pgpickle.c:

#include "postgres.h"
#include "utils/builtins.h"
#include "lib/stringinfo.h"

typedef bytea pickle;

PG_FUNCTION_INFO_V1(pickle_in);
PG_FUNCTION_INFO_V1(pickle_out);
PG_FUNCTION_INFO_V1(pickle_receive);
PG_FUNCTION_INFO_V1(pickle_send);

Datum
pickle_in(PG_FUNCTION_ARGS)
{
  // takes string, returns pickle
  char *str = PG_GETARG_CSTRING(0);
  pickle *result = DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str)));
  PG_RETURN_BYTEA_P(result);
}

Datum
pickle_out(PG_FUNCTION_ARGS)
{
  // takes pickle, returns cstring
  char *result;
  result = DatumGetCString(DirectFunctionCall1(byteaout, PG_GETARG_DATUM(0)));
  PG_RETURN_CSTRING(result);
}

And it’s associated SQL script, pgpickle.sql:

DROP TYPE pickle CASCADE;
DROP FUNCTION pickle_in(cstring);
DROP FUNCTION pickle_out(pickle);

-- force a reload if needed
LOAD 'pgpickle.so';

CREATE FUNCTION pickle_in(cstring) RETURNS pickle
AS 'pgpickle.so'
LANGUAGE C IMMUTABLE STRICT;

CREATE FUNCTION pickle_out(pickle) RETURNS cstring
AS 'pgpickle.so'
LANGUAGE C IMMUTABLE STRICT;

CREATE TYPE pickle (
        INTERNALLENGTH = 4,
        EXTERNALLENGTH = variable,
        INPUT = pickle_in,
        OUTPUT = pickle_out
);

COMMENT ON FUNCTION pickle_in(cstring)
    IS 'function to convert string to pickle';
COMMENT ON FUNCTION pickle_out(pickle)
    IS 'function to convert pickle to string';
COMMENT ON TYPE pickle
    IS 'nifty bytea synonym';

Now, I haven’t programmed C seriously since school, and while the above code seemed to work, I didn’t want to commit to testing it and maintaining it.
So I went back to the Python code. For a day. I can’t ever leave something be (OCD? In my house, we call it Orange County Department ’cause the kids are down with it, too, and we don’t want to give them a complex. Not yet.)
The last approach I tried, the one I’m still using, is a simple and powerful INSERT statement:

INSERT INTO pg_catalog.pg_type (
    typname, typnamespace, typowner, typlen,
    typbyval, typtype, typisdefined, typdelim,
    typrelid, typelem, typinput, typoutput,
    typreceive, typsend, typanalyze, typalign,
    typstorage, typnotnull, typbasetype, typtypmod,
    typndims, typdefaultbin, typdefault
)
VALUES (
   'pickle', 11, 1, -1,
   'f', 'b', 't', ',',
   0, 0, 'byteain', 'byteaout',
   'bytearecv', 'byteasend', 0, 'i',
   'x', 'f', 0, -1,
   0, '', ''
);

There’s probably a better way to formulate the statement, but I was so elated at it’s correct operation that I backed away from the keyboard slowly. And did a little jig of joy.
This last approach creates the type correctly in PG, and with a little magic in Python, maps the type correctly to client code. Here’s the final Python version:

import psycopg2._psycopg as dbapiext

def register_clone_type(connection, source_name, target_name):
    sql = """
    SELECT %s::regtype::oid
    """
    cursor = connection.cursor()
    string_types = dbapiext.string_types

    cursor.execute(sql, (source_name, ))
    source_oid = cursor.fetchone()[0]

    cursor.execute(sql, (target_name, ))
    target_oid = cursor.fetchone()[0]

    if target_oid not in string_types:
        string_types[target_oid] = string_types[source_oid]
    return target_oid

register_clone_type(myconnection, 'bytea', 'pickle')
Advertisement

Comments are closed.