PostgreSQL - Binary Large Objects
Inhaltsverzeichnis
Introduction
In PostgreSQL there exist several ways to manage Binary Large Objects (LOB, BLOB):
- Data type BYTEA
- Large object (LO) facility 'pg_largeobject' (a system table)
- Data type DATALINK (experimental)
PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.
BYTEA
The BYTEA data type allows storage of binary strings.
- It stores a LOB within the table, respectively using TOAST.
- It is thus limited to 1 GB
- The storage is octal and allows non printable characters (in contrast to character strings which don't, but allow locale settings).
- The input/output format is HEX (as of PostgreSQL 9.0).
Notes:
- BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different.
- BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).
See PostgreSQL doc Binary Data Types.
Large object (LO) facility
Large objects (LO) can also placed in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID.
- There is a read/write API which offers client (= modules written in C) and server-side (= SQL) functions.
- The main related SQL functions are: lo_creat(), lo_create(), lo_unlink(), lo_import(), and lo_export(). lo_import() and lo_export() need permissions of the database's owning user (i.e. superuser).
- LO are broken into "chunks" and stored in btree-indexed rows.
- LO allows values up to 2 GB in size, whereas TOASTed fields (like BYTEA) can be at most 1 GB.
- LO entries can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST (and e.g BYTEA).
Note:
- When PostgreSQL doc. mentions 'lo' (LO = Large Object) it typically refers to this facility.
- In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.
IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):
- Since PostgreSQL considers a LO entry as an object on it's own, pg_largeobjects grows infinitely unless a separate cleaning is made.
- To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.
- See additional PostgreSQL modules 'lo' and 'vaccumlo' in the PostgreSQL docs.
See PostgreSQL doc 'Large Objects' and JDBC data type BLOB: [1].
Example: Typical usage of the server-side (= SQL) functions (from the Postgres docs):
CREATE TABLE image ( name text, raster oid ); SELECT lo_creat(-1); -- returns OID of new, empty large object. SELECT lo_create(43213); -- attempts to create large object with OID 43213. SELECT lo_unlink(173454); -- deletes large object with OID 173454. INSERT INTO image (name, raster) VALUES ('beautiful image', lo_import('/etc/motd')); INSERT INTO image (name, raster) -- same as above, but specify OID to use. VALUES ('beautiful image', lo_import('/etc/motd', 68583)); SELECT lo_export(image.raster, '/tmp/motd') FROM image -- need superuser permission. WHERE name = 'beautiful image';
DATALINK
The DATALINK type stores file URLs in database columns and applies constraints on it. It's defined in Standard SQL 'SQL/MED' but not yet used widely.
- Maintains a link to a specific file in external storage.
- Database system takes over control over external files (rename, delete, permissions are done with SQL).
- Files size is unlimited, respectively limited by external storage. No need to store file contents in database system.
DATALINK parameters:
- NO LINK CONTROL Datalink value need not reference an existing file/URL.
- FILE LINK CONTROL Datalink value must reference an existing file/URL.
- INTEGRITY ALL Referenced files can only be renamed or deleted through SQL.
- INTEGRITY SELECTIVE Referenced files can be renamed or deleted through SQL or directly.
- INTEGRITY NONE (implied for NO LINK CONTROL)
- ON UNLINK DELETE File is deleted from file system when deleted from database.
- ON UNLINK RESTORE File’s original permissions are restored when deleted from database.
- ON UNLINK NONE No change in file permissions when file reference is deleted from database.
- RECOVERY YES PITR applies to referenced files.
- RECOVERY NO PITR does not apply to referenced files.
Example:
CREATE TABLE image ( id integer, name text, picture DATALINK [NO LINK CONTROL] ); INSERT INTO persons VALUES ( 1, 'Jon Doe', DLVALUE('file://some/where/1.jpg') );
Weblinks:
- DATALINK wiki page
- Presentation of Peter Eisentraut, PGCon 2009: [2]
- MSE-Seminar Thesis (2011) from Florian Schwendener about "SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server": [3]