PostgreSQL - Binary Large Objects

Aus Geoinformation HSR
Version vom 8. Januar 2012, 01:06 Uhr von Stefan (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „In PostgreSQL there exist three ways to manage Binary Large Objects (LOB, BLOB): # Data type BYTEA # LO facility 'pg_largeobject' (a system table) # Data typ…“)

(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu: Navigation, Suche

In PostgreSQL there exist three ways to manage Binary Large Objects (LOB, BLOB):

  1. Data type BYTEA
  2. LO facility 'pg_largeobject' (a system table)
  3. Data type DATALINK (experimental)

PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.

BYTEA

The BYTEA data type allows storage of binary strings.

  • It stores a LOB within the table, respectively using TOAST.
  • It is thus limited to 1 GB
  • The storage is octal and allows non printable characters (in contrast to character strings which don't, but allow locale settings).
  • The input/output format is HEX (as of PostgreSQL 9.0).

Notes:

  • BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different.
  • BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).

See PostgreSQL doc Binary Data Types.

Large objects (LO)

Large objects (LO) can also placed in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID.

  • There is a read/write API which offers client (= modules written in C) and server-side (= SQL) functions.
  • The main related SQL functions are: lo_creat(), lo_create(), lo_unlink(), lo_import(), and lo_export(). lo_import() and lo_export() need permissions of the database's owning user (i.e. superuser).
  • LO are broken into "chunks" and stored in btree-indexed rows.
  • LO allows values up to 2 GB in size, whereas TOASTed fields (like BYTEA) can be at most 1 GB.
  • LO entries can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST (and e.g BYTEA).

Note:

  • When PostgreSQL doc. mentions 'lo' (LO = Large Object) it typically refers to this facility.
  • In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.

IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):

  • Since PostgreSQL considers a LO entry as an object on it's own, pg_largeobjects grows infinitely unless a separate cleaning is made.
  • To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.
  • See additional PostgreSQL modules 'lo' and 'vaccumlo' in the PostgreSQL docs.

See PostgreSQL doc 'Large Objects' and JDBC data type BLOB: [1].

Example: Typical usage of the server-side (= SQL) functions (from the Postgres docs):

 CREATE TABLE image (
   name            text,
   raster          oid
 );
 
 SELECT lo_creat(-1);       -- returns OID of new, empty large object.
 
 SELECT lo_create(43213);   -- attempts to create large object with OID 43213.
 
 SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
 INSERT INTO image (name, raster)
   VALUES ('beautiful image', lo_import('/etc/motd'));
 
 INSERT INTO image (name, raster)  -- same as above, but specify OID to use.
   VALUES ('beautiful image', lo_import('/etc/motd', 68583));
 
 SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
   WHERE name = 'beautiful image';

DATALINK

The DATALINK type maintains a link to a specific file in external storage.

Example:

 CREATE TABLE mydata (
    myid Integer, 
    myimage DATALINK(40) 
    FILE LINK CONTROL INTEGRITY ALL
 );

Weblinks:

  • DATALINK wiki page
  • MSE-Seminar Thesis (2011) from Florian Schwendener about SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server: [2]