PostgreSQL - Binary Large Objects: Unterschied zwischen den Versionen

Aus Geoinformation HSR
Wechseln zu: Navigation, Suche
K
K (Introduction)
 
(14 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 2: Zeile 2:
  
 
In [[PostgreSQL]] there exist several ways to manage Binary Large Objects (LOB, BLOB):  
 
In [[PostgreSQL]] there exist several ways to manage Binary Large Objects (LOB, BLOB):  
# Data type BYTEA
+
# Basic binary data type BYTEA
# Large object (LO) facility 'pg_largeobject' (a system table)
+
# Basic character data type TEXT
# Data type DATALINK (experimental)
+
# Large object (LO) facility 'pg_largeobject'
 +
# Data type DATALINK (that is just spec., no implementation)
  
PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.
+
PostgreSQL also supports a storage system called "TOAST" (The Oversized-Attribute Storage Technique) that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.
 +
 
 +
Here are some design and implementation questions regarding large objects:
 +
* Is the LO (conceptually) part of the object/entity - or just associated with it?
 +
* How is the LO accessed: as encoded string, as file or stream-based?
 +
* If not stored within the table, who maintains the integrity of the LO?
 +
 
 +
== Importing/Exporting External Binary Files ==
 +
 
 +
When using BYTEA and TEXT (and XML) the common way to import/export external files is inside client (e.g. Java/Python) or serverside programms (e.g. PL/Python). It's difficult to handle external files with psql.
 +
 
 +
See Stackexchange: [http://dba.stackexchange.com/questions/1742/how-to-insert-file-data-into-a-postgresql-bytea-column]
 +
[http://dba.stackexchange.com/questions/8172/sql-to-read-xml-from-file-into-postgresql-database].
  
 
== BYTEA ==
 
== BYTEA ==
Zeile 13: Zeile 26:
 
* It stores a LOB within the table, respectively using TOAST.  
 
* It stores a LOB within the table, respectively using TOAST.  
 
* It is thus limited to 1 GB
 
* It is thus limited to 1 GB
* The storage is octal and allows non printable characters (in contrast to character strings which don't, but allow locale settings).
+
* The storage is octal and allows non printable characters (in contrast to character strings which don't).
 
* The input/output format is HEX (as of PostgreSQL 9.0).
 
* The input/output format is HEX (as of PostgreSQL 9.0).
  
Zeile 21: Zeile 34:
  
 
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/datatype-binary.html Binary Data Types].
 
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/datatype-binary.html Binary Data Types].
 +
 +
== TEXT ==
 +
 +
Basic data type ''text'' it's just here for completeness. This is a variable character type with unlimited length (up to 1 GB). character types allow locale settings.
 +
 +
It's not a byte string but one could still use it when the binary string is preprocessed and encoded into printable form (e.g. base64 or hex).
  
 
== Large object (LO) facility ==
 
== Large object (LO) facility ==
Zeile 35: Zeile 54:
 
* In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.  
 
* In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.  
  
IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):  
+
'''IMPORTANT NOTE''' when using JDBC BLOB (or @Lob annotation in Hibernate):  
* Since PostgreSQL considers a LO entry as an object on it's own, pg_largeobjects grows infinitely unless a separate cleaning is made.  
+
* Since PostgreSQL considers a LO entry as an object on it's own, deleting or upating rows in the user table does'nt delete or delete entries in pg_largeobjects. pg_largeobjects therefore grows infinitely unless a separate cleaning is made (see this [https://forum.hibernate.org/viewtopic.php?p=2382442 error report in Hibernate forum]).  
 
* To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.  
 
* To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.  
 
* See additional PostgreSQL modules [http://www.postgresql.org/docs/9.1/static/lo.html 'lo'] and [http://www.postgresql.org/docs/9.1/static/vacuumlo.html 'vaccumlo'] in the PostgreSQL docs.
 
* See additional PostgreSQL modules [http://www.postgresql.org/docs/9.1/static/lo.html 'lo'] and [http://www.postgresql.org/docs/9.1/static/vacuumlo.html 'vaccumlo'] in the PostgreSQL docs.
Zeile 42: Zeile 61:
 
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/largeobjects.html 'Large Objects'] and JDBC data type BLOB: [http://jdbc.postgresql.org/documentation/head/binary-data.html].
 
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/largeobjects.html 'Large Objects'] and JDBC data type BLOB: [http://jdbc.postgresql.org/documentation/head/binary-data.html].
  
Example: Typical usage of the server-side (= SQL) functions (from the Postgres docs):
+
Example: Typical usage in SQL (based on Postgres docs):
  
 
   CREATE TABLE image (
 
   CREATE TABLE image (
     name           text,
+
    id integer,
     raster          oid
+
     name text,
 +
     picture oid
 
   );
 
   );
 
    
 
    
Zeile 55: Zeile 75:
 
   SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
   SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
    
 
    
   INSERT INTO image (name, raster)
+
   INSERT INTO image (id, name, picture)
     VALUES ('beautiful image', lo_import('/etc/motd'));
+
     VALUES (1, 'beautiful image', lo_import('/etc/motd'));
 
    
 
    
   INSERT INTO image (name, raster)  -- same as above, but specify OID to use.
+
   INSERT INTO image (id, name, picture)  -- same as above, but specify OID to use.
     VALUES ('beautiful image', lo_import('/etc/motd', 68583));
+
     VALUES (1, 'beautiful image', lo_import('/etc/motd', 68583));
 
    
 
    
 
   SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
 
   SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
Zeile 66: Zeile 86:
 
== DATALINK ==
 
== DATALINK ==
  
The DATALINK type maintains a link to a specific file in external storage.
+
NOTE: There is currently no implementation in PostgreSQL for that. It's just a specification defined in Standard SQL 'SQL/MED'.
  
Example:  
+
The DATALINK type stores file URLs in database columns and applies constraints on it. 
   CREATE TABLE mydata (
+
* Maintains a link to a specific file in external storage.
    myid Integer,  
+
* Database system takes over control over external files (rename, delete, permissions are done with SQL) if defined so.
    myimage DATALINK(40)  
+
* File size is unlimited, respectively limited by external storage. No need to store file contents in database system.
    FILE LINK CONTROL INTEGRITY ALL
+
 
 +
DATALINK parameters:
 +
* NO LINK CONTROL Datalink value need not reference an existing file/URL.
 +
* FILE LINK CONTROL Datalink value must reference an existing file/URL.
 +
* INTEGRITY ALL Referenced files can only be renamed or deleted through SQL.
 +
* INTEGRITY SELECTIVE Referenced files can be renamed or deleted through SQL or directly.
 +
* INTEGRITY NONE (implied for NO LINK CONTROL)
 +
* ON UNLINK DELETE File is deleted from file system when deleted from database.
 +
* ON UNLINK RESTORE File’s original permissions are restored when deleted from database.
 +
* ON UNLINK NONE No change in file permissions when file reference is deleted from database.
 +
* RECOVERY YES PITR applies to referenced files.
 +
* RECOVERY NO PITR does not apply to referenced files.
 +
 
 +
Status and Installation:
 +
* unclear.
 +
 
 +
Example:
 +
   CREATE TABLE image (
 +
    id integer,
 +
    name text,
 +
    picture DATALINK [NO LINK CONTROL]
 +
  );
 +
 
 +
  INSERT INTO persons VALUES (
 +
    1,
 +
    'Jon Doe',
 +
    DLVALUE('file://some/where/1.jpg')
 
   );
 
   );
  
 
Weblinks:
 
Weblinks:
 
* [http://wiki.postgresql.org/wiki/DATALINK DATALINK wiki page]
 
* [http://wiki.postgresql.org/wiki/DATALINK DATALINK wiki page]
 +
* Presentation of Peter Eisentraut, PGCon 2009: [http://www.pgcon.org/2009/schedule/attachments/133_pgcon2009-sqlmed.pdf]
 
* MSE-Seminar Thesis (2011) from Florian Schwendener about ''"SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server"'': [http://wiki.hsr.ch/Datenbanken/SeminarDatenbanksystemeHS11]
 
* MSE-Seminar Thesis (2011) from Florian Schwendener about ''"SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server"'': [http://wiki.hsr.ch/Datenbanken/SeminarDatenbanksystemeHS11]

Aktuelle Version vom 3. Dezember 2012, 01:40 Uhr

Introduction

In PostgreSQL there exist several ways to manage Binary Large Objects (LOB, BLOB):

  1. Basic binary data type BYTEA
  2. Basic character data type TEXT
  3. Large object (LO) facility 'pg_largeobject'
  4. Data type DATALINK (that is just spec., no implementation)

PostgreSQL also supports a storage system called "TOAST" (The Oversized-Attribute Storage Technique) that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.

Here are some design and implementation questions regarding large objects:

  • Is the LO (conceptually) part of the object/entity - or just associated with it?
  • How is the LO accessed: as encoded string, as file or stream-based?
  • If not stored within the table, who maintains the integrity of the LO?

Importing/Exporting External Binary Files

When using BYTEA and TEXT (and XML) the common way to import/export external files is inside client (e.g. Java/Python) or serverside programms (e.g. PL/Python). It's difficult to handle external files with psql.

See Stackexchange: [1] [2].

BYTEA

The BYTEA data type allows storage of binary strings.

  • It stores a LOB within the table, respectively using TOAST.
  • It is thus limited to 1 GB
  • The storage is octal and allows non printable characters (in contrast to character strings which don't).
  • The input/output format is HEX (as of PostgreSQL 9.0).

Notes:

  • BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different.
  • BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).

See PostgreSQL doc Binary Data Types.

TEXT

Basic data type text it's just here for completeness. This is a variable character type with unlimited length (up to 1 GB). character types allow locale settings.

It's not a byte string but one could still use it when the binary string is preprocessed and encoded into printable form (e.g. base64 or hex).

Large object (LO) facility

Large objects (LO) can also placed in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID.

  • There is a read/write API which offers client (= modules written in C) and server-side (= SQL) functions.
  • The main related SQL functions are: lo_creat(), lo_create(), lo_unlink(), lo_import(), and lo_export(). lo_import() and lo_export() need permissions of the database's owning user (i.e. superuser).
  • LO are broken into "chunks" and stored in btree-indexed rows.
  • LO allows values up to 2 GB in size, whereas TOASTed fields (like BYTEA) can be at most 1 GB.
  • LO entries can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST (and e.g BYTEA).

Note:

  • When PostgreSQL doc. mentions 'lo' (LO = Large Object) it typically refers to this facility.
  • In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.

IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):

  • Since PostgreSQL considers a LO entry as an object on it's own, deleting or upating rows in the user table does'nt delete or delete entries in pg_largeobjects. pg_largeobjects therefore grows infinitely unless a separate cleaning is made (see this error report in Hibernate forum).
  • To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.
  • See additional PostgreSQL modules 'lo' and 'vaccumlo' in the PostgreSQL docs.

See PostgreSQL doc 'Large Objects' and JDBC data type BLOB: [3].

Example: Typical usage in SQL (based on Postgres docs):

 CREATE TABLE image (
   id integer,
   name text,
   picture oid
 );
 
 SELECT lo_creat(-1);       -- returns OID of new, empty large object.
 
 SELECT lo_create(43213);   -- attempts to create large object with OID 43213.
 
 SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
 INSERT INTO image (id, name, picture)
   VALUES (1, 'beautiful image', lo_import('/etc/motd'));
 
 INSERT INTO image (id, name, picture)  -- same as above, but specify OID to use.
   VALUES (1, 'beautiful image', lo_import('/etc/motd', 68583));
 
 SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
   WHERE name = 'beautiful image';

DATALINK

NOTE: There is currently no implementation in PostgreSQL for that. It's just a specification defined in Standard SQL 'SQL/MED'.

The DATALINK type stores file URLs in database columns and applies constraints on it.

  • Maintains a link to a specific file in external storage.
  • Database system takes over control over external files (rename, delete, permissions are done with SQL) if defined so.
  • File size is unlimited, respectively limited by external storage. No need to store file contents in database system.

DATALINK parameters:

  • NO LINK CONTROL Datalink value need not reference an existing file/URL.
  • FILE LINK CONTROL Datalink value must reference an existing file/URL.
  • INTEGRITY ALL Referenced files can only be renamed or deleted through SQL.
  • INTEGRITY SELECTIVE Referenced files can be renamed or deleted through SQL or directly.
  • INTEGRITY NONE (implied for NO LINK CONTROL)
  • ON UNLINK DELETE File is deleted from file system when deleted from database.
  • ON UNLINK RESTORE File’s original permissions are restored when deleted from database.
  • ON UNLINK NONE No change in file permissions when file reference is deleted from database.
  • RECOVERY YES PITR applies to referenced files.
  • RECOVERY NO PITR does not apply to referenced files.

Status and Installation:

  • unclear.

Example:

 CREATE TABLE image (
   id integer,
   name text,
   picture DATALINK [NO LINK CONTROL]
 );
 
 INSERT INTO persons VALUES (
   1,
   'Jon Doe',
   DLVALUE('file://some/where/1.jpg')
 );

Weblinks:

  • DATALINK wiki page
  • Presentation of Peter Eisentraut, PGCon 2009: [4]
  • MSE-Seminar Thesis (2011) from Florian Schwendener about "SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server": [5]