PostgreSQL - Binary Large Objects: Unterschied zwischen den Versionen

Aus Geoinformation HSR
Wechseln zu: Navigation, Suche
(Introduction)
K (Large object (LO) facility)
Zeile 44: Zeile 44:
  
 
IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):  
 
IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):  
* Since PostgreSQL considers a LO entry as an object on it's own, pg_largeobjects grows infinitely unless a separate cleaning is made.  
+
* Since PostgreSQL considers a LO entry as an object on it's own, deleting or upating rows in the user table does'nt delete or delete entries in pg_largeobjects. pg_largeobjects therefore grows infinitely unless a separate cleaning is made.  
 
* To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.  
 
* To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.  
 
* See additional PostgreSQL modules [http://www.postgresql.org/docs/9.1/static/lo.html 'lo'] and [http://www.postgresql.org/docs/9.1/static/vacuumlo.html 'vaccumlo'] in the PostgreSQL docs.
 
* See additional PostgreSQL modules [http://www.postgresql.org/docs/9.1/static/lo.html 'lo'] and [http://www.postgresql.org/docs/9.1/static/vacuumlo.html 'vaccumlo'] in the PostgreSQL docs.

Version vom 8. Januar 2012, 18:41 Uhr

Introduction

In PostgreSQL there exist several ways to manage Binary Large Objects (LOB, BLOB):

  1. Basic binary data type BYTEA
  2. Basic character data type TEXT
  3. Large object (LO) facility 'pg_largeobject'
  4. Data type DATALINK (experimental)

PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.

Regarding data type text it's just here for completeness. This is a variable character type with unlimited length (up to 1 GB). One can also store encoded binary codes in it.

Here are some design and implementation questions regarding large objects:

  • Is the LO (conceptually) part of the object/entity - or just associated with it?
  • How is the LO accessed: as encoded string, as file or stream-based?
  • If not stored within the table, who maintains the integrity of the LO?

BYTEA

The BYTEA data type allows storage of binary strings.

  • It stores a LOB within the table, respectively using TOAST.
  • It is thus limited to 1 GB
  • The storage is octal and allows non printable characters (in contrast to character strings which don't, but allow locale settings).
  • The input/output format is HEX (as of PostgreSQL 9.0).

Notes:

  • BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different.
  • BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).

See PostgreSQL doc Binary Data Types.

Large object (LO) facility

Large objects (LO) can also placed in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID.

  • There is a read/write API which offers client (= modules written in C) and server-side (= SQL) functions.
  • The main related SQL functions are: lo_creat(), lo_create(), lo_unlink(), lo_import(), and lo_export(). lo_import() and lo_export() need permissions of the database's owning user (i.e. superuser).
  • LO are broken into "chunks" and stored in btree-indexed rows.
  • LO allows values up to 2 GB in size, whereas TOASTed fields (like BYTEA) can be at most 1 GB.
  • LO entries can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST (and e.g BYTEA).

Note:

  • When PostgreSQL doc. mentions 'lo' (LO = Large Object) it typically refers to this facility.
  • In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.

IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):

  • Since PostgreSQL considers a LO entry as an object on it's own, deleting or upating rows in the user table does'nt delete or delete entries in pg_largeobjects. pg_largeobjects therefore grows infinitely unless a separate cleaning is made.
  • To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.
  • See additional PostgreSQL modules 'lo' and 'vaccumlo' in the PostgreSQL docs.

See PostgreSQL doc 'Large Objects' and JDBC data type BLOB: [1].

Example: Typical usage in SQL (based on Postgres docs):

 CREATE TABLE image (
   id integer,
   name text,
   picture oid
 );
 
 SELECT lo_creat(-1);       -- returns OID of new, empty large object.
 
 SELECT lo_create(43213);   -- attempts to create large object with OID 43213.
 
 SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
 INSERT INTO image (id, name, picture)
   VALUES (1, 'beautiful image', lo_import('/etc/motd'));
 
 INSERT INTO image (id, name, picture)  -- same as above, but specify OID to use.
   VALUES (1, 'beautiful image', lo_import('/etc/motd', 68583));
 
 SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
   WHERE name = 'beautiful image';

DATALINK

The DATALINK type stores file URLs in database columns and applies constraints on it. It's defined in Standard SQL 'SQL/MED' but not yet used widely.

  • Maintains a link to a specific file in external storage.
  • Database system takes over control over external files (rename, delete, permissions are done with SQL) if defined so.
  • File size is unlimited, respectively limited by external storage. No need to store file contents in database system.

DATALINK parameters:

  • NO LINK CONTROL Datalink value need not reference an existing file/URL.
  • FILE LINK CONTROL Datalink value must reference an existing file/URL.
  • INTEGRITY ALL Referenced files can only be renamed or deleted through SQL.
  • INTEGRITY SELECTIVE Referenced files can be renamed or deleted through SQL or directly.
  • INTEGRITY NONE (implied for NO LINK CONTROL)
  • ON UNLINK DELETE File is deleted from file system when deleted from database.
  • ON UNLINK RESTORE File’s original permissions are restored when deleted from database.
  • ON UNLINK NONE No change in file permissions when file reference is deleted from database.
  • RECOVERY YES PITR applies to referenced files.
  • RECOVERY NO PITR does not apply to referenced files.

Status and Installation:

  • unclear.

Example:

 CREATE TABLE image (
   id integer,
   name text,
   picture DATALINK [NO LINK CONTROL]
 );
 
 INSERT INTO persons VALUES (
   1,
   'Jon Doe',
   DLVALUE('file://some/where/1.jpg')
 );

Weblinks:

  • DATALINK wiki page
  • Presentation of Peter Eisentraut, PGCon 2009: [2]
  • MSE-Seminar Thesis (2011) from Florian Schwendener about "SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server": [3]