PostgreSQL - Tipps und Tricks: Unterschied zwischen den Versionen

Aus Geoinformation HSR
Wechseln zu: Navigation, Suche
K (Large objects (LO))
(Create "Read-only User")
 
(36 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 4: Zeile 4:
  
  
== Meine unsortierten Notizen... ==
+
== Stefan's Notizen... ==
 +
 
 +
=== Passwörter ===
 +
 
 +
* Setzen/Ändern Passwort für eine Datenbank: 
 +
  % psql -d unsere_db -c "ALTER USER unsere_db WITH PASSWORD 'neues_passwort';"
 +
 
 +
* TIPP: PG Programme haben nicht gern, wenn das PW fehlt!
 +
 
 +
* TIPP: Das Setzen des Passworts für Super User postgres auf PW postgres verlangt template1 und nicht postgres:
 +
  % psql -d template1 -c "ALTER USER postgres WITH PASSWORD 'postgres';"
  
=== Tools und Konfiguration ===
 
 
* psql und wohl alle weiteren PostgreSQL-Tools (dropdb, createdb, pg_dump, etc.) u.a. brauchen Angaben zu Benutzer und Passwort. Der Benutzer wird mit Parameter -U mitgegeben, das Passwort hingegen kann (mit Absicht) nicht mitgegeben werden. Es gibt aber die Möglichkeit, Umgebungsvariablen zu setzen:
 
* psql und wohl alle weiteren PostgreSQL-Tools (dropdb, createdb, pg_dump, etc.) u.a. brauchen Angaben zu Benutzer und Passwort. Der Benutzer wird mit Parameter -U mitgegeben, das Passwort hingegen kann (mit Absicht) nicht mitgegeben werden. Es gibt aber die Möglichkeit, Umgebungsvariablen zu setzen:
 
   SET PGUSER=postgres
 
   SET PGUSER=postgres
 
   SET PGPASSWORD=<meinpasswort>
 
   SET PGPASSWORD=<meinpasswort>
  
* Start pgsl-Client:  
+
* Start pgsl-Client mit Umlauten (Windows):  
  > psql -h localhost -p 5432 template1 "postgres"
 
  oder
 
 
   > cmd /c chcp 1252
 
   > cmd /c chcp 1252
 
   > psql -d gisdb -U postgres
 
   > psql -d gisdb -U postgres
Zeile 22: Zeile 29:
 
=== Create "Read-only User" ===
 
=== Create "Read-only User" ===
  
From: [http://stackoverflow.com/questions/760210/how-do-you-create-a-read-only-user-in-postgresql]
+
Generate 'SELECT' grant commands from a catalog query:
 
 
If you only grant CONNECT to a database, the user can connect but has no other privileges.
 
In PostgreSQL 8.x you have to grant USAGE on namespaces (schemas) and SELECT on tables and views individually. So something like:
 
 
 
  GRANT CONNECT ON DATABASE mydb TO xxx;
 
  -- This assumes you're actually connected to mydb..
 
  GRANT USAGE ON SCHEMA public TO xxx;
 
  GRANT SELECT ON mytable TO xxx;
 
 
 
Granting 'SELECT' to each table/view individually can be a bore. Something like this can help:
 
  
 
   SELECT 'GRANT SELECT ON ' || relname || ' TO xxx;'
 
   SELECT 'GRANT SELECT ON ' || relname || ' TO xxx;'
Zeile 38: Zeile 35:
 
   WHERE nspname = 'public' AND relkind IN ('r', 'v')
 
   WHERE nspname = 'public' AND relkind IN ('r', 'v')
  
This should output the relevant GRANT commands to GRANT SELECT on all tables and views in public, for copy-n-paste.
+
Good to know:
 +
* If you only grant CONNECT to a database, the user can connect but has no other privileges.
 +
* PG makes a distinction between GRANT SELECT and GRANT USAGE for security reasons (poking metadata about tables).
 +
* PG knows a schema PUBLIC and also a 'default role' (which define 'DEFAULT PRIVILEGES') also called PUBLIC.
 +
* PG sets (and 'inherits') security 'flags' at the time of the commands (e.g. GRANT). This means, 1. when a 'parent-flag' changes this change is not propagated ('inherited') and 2. A "GRANT SELECT ON ALL TABLES IN SCHEMA public TO ourtable;" only affects tables that have already been created.
 +
* Since PostgreSQL 9.0 you still need to grant USAGE permissions on schemas, but you can grant permissions on all tables, views etc. in the schema using a single command rather than having to generate them:
  
PostgreSQL 9:
+
Read-Only user (with variants):  
 
+
In PostgreSQL 9.0 its easier. You still need to grant USAGE permissions on schemas, but you can grant permissions on all tables, views etc. in the schema using a single command rather than having to generate them:
+
  # Prevent default users from creating tables
 
+
  REVOKE CREATE ON SCHEMA public FROM public;
   GRANT SELECT ON ALL TABLES IN SCHEMA public TO xxx;
+
  REVOKE USAGE ON SCHEMA public FROM public;
 
+
 
This only affects tables that have already been created, so it's the equivalent of the generated commands above. More powerfully, you can automatically have default roles assigned to new objects in future:
+
  # Create read-only user
 
+
  CREATE ROLE readonlyuser WITH LOGIN ENCRYPTED PASSWORD 'strongpassword';
   ALTER DEFAULT PRIVILEGES IN SCHEMA public
+
   GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonlyuser;
    GRANT SELECT ON TABLES TO osmgis;
+
  GRANT USAGE ON SCHEMA public TO readonlyuser; -- omit USAGE if needed
 
+
 
Note that by default this will only affect objects (tables) created by the user that issued this command: although it can also be set on any role that the issuing user is a member of. However, you don't pick up default privileges for all roles you're a member of when creating new objects... so there's still some faffing around. If you adopt the approach that a database has an owning role, and schema changes are performed as that owning role, then you should assign default privileges to that owning role. IMHO this is all a bit confusing and you may need to experiment to come up with a functional workflow.
+
  # Restore USAGE/CREATE to default user
 +
   GRANT USAGE  ON SCHEMA public TO public;
 +
  GRANT CREATE ON SCHEMA public TO public;
  
 +
(Source: note that not all answers are uptodate or correct: https://stackoverflow.com/questions/760210/how-do-you-create-a-read-only-user-in-postgresql )
  
 
=== Daten laden/importieren ===
 
=== Daten laden/importieren ===
Zeile 70: Zeile 75:
 
=== Foreign Data Wrappers ===
 
=== Foreign Data Wrappers ===
  
Ab Version 9.1 werden Foreign Data Wrappers (FDW) und ein Teil des SQL/MED-Standards unterstützt.
+
Ab Version 9.1 werden Foreign Data Wrappers (FDW) und ein Teil des SQL/MED-Standards unterstützt. file_fdw ist eine offizielle Extension (contrib module): [http://www.postgresql.org/docs/9.1/static/file-fdw.html].
 
 
file_fdw ist eine offizielle Extension (contrib module): [http://www.postgresql.org/docs/9.1/static/file-fdw.html].
 
  
 
Siehe auch PostgreSQL Wiki:
 
Siehe auch PostgreSQL Wiki:
 
* Foreign_data_wrappers [http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fdw]
 
* Foreign_data_wrappers [http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fdw]
 
* SQL/MED [http://wiki.postgresql.org/wiki/SQL/MED#Foreign_data_wrappers]
 
* SQL/MED [http://wiki.postgresql.org/wiki/SQL/MED#Foreign_data_wrappers]
 +
 +
<b>Linking open and remote (PostgreSQL) read-only databases easily! Uses postgres_fdw the SQL/MED implementation (Foreign Data Wrapper).</b>
 +
 +
Caveats:
 +
* Must be superuser to create extension postgres_fdw.
 +
* Joins are slow or even never returning.
 +
 +
Future Work:
 +
* Create a stored procedure link_pg_server(...).
 +
* Promote sharing of endpoints like SPARQL.
 +
 +
Tested with PostgreSQL 11. Credits: https://robots.thoughtbot.com/postgres-foreign-data-wrapper
 +
 +
  -- Step 1: Preparation
 +
  CREATE EXTENSION IF NOT EXISTS postgres_fdw;
 +
 
 +
  -- Connect to remote DB server:
 +
  DROP SERVER IF EXISTS gis_db_server CASCADE;
 +
  CREATE SERVER gis_db_server
 +
    FOREIGN DATA WRAPPER postgres_fdw
 +
    OPTIONS (host 'xxx', port 'xxx', dbname 'xxx');
 +
  CREATE USER MAPPING FOR CURRENT_USER
 +
    SERVER gis_db_server
 +
    OPTIONS (user 'xxx', password 'xxx');
 +
 
 +
  -- Step 2: Import Schema
 +
  DROP SCHEMA IF EXISTS gis_db_public CASCADE;
 +
  CREATE SCHEMA gis_db_public;
 +
  CREATE EXTENSION IF NOT EXISTS postgis;
 +
  CREATE EXTENSION IF NOT EXISTS hstore; -- needed by gis_db schema
 +
  IMPORT FOREIGN SCHEMA public
 +
    FROM SERVER gis_db_server
 +
    INTO gis_db_public;
 +
 
 +
  -- Step 3: Go!
 +
  SELECT COUNT(*) FROM gis_db_public.osm_point;
 +
  -- Slow - or even never returning! No operator pushdown ???
 +
  -- (no "join op nor tmp table sent to linked db)
 +
  -- See EXPLAIN:  https://explain.depesz.com/s/PStr
 +
  WITH tmp(osm_id, name) AS (
 +
  VALUES (4066310784, 'Tierpark Chur'), (1398864548, 'Tierpark Aletsch')
 +
  )
 +
  SELECT point.*
 +
  FROM gis_db_public.osm_point AS point
 +
  JOIN tmp ON tmp.osm_id=point.osm_id;
  
 
== PostgreSQL SQL ==
 
== PostgreSQL SQL ==
Zeile 84: Zeile 132:
 
Version:
 
Version:
 
   select version();
 
   select version();
 +
 +
Finding duplicate values in one or more columns:
 +
<pre>
 +
SELECT *
 +
FROM (
 +
  SELECT
 +
    row_number() OVER (PARTITION BY email),
 +
    name,
 +
    email
 +
  FROM dedup) t
 +
WHERE t.row_number < 2;
 +
</pre>
  
 
Anfrage des Geometrie-Typs ('the_geom' ist der Name des Geometrie-Felds):   
 
Anfrage des Geometrie-Typs ('the_geom' ist der Name des Geometrie-Felds):   
 
   SELECT *
 
   SELECT *
    FROM sometable
+
  FROM sometable
    WHERE GeometryType(the_geom) = 'GEOMETRYCOLLECTION'
+
  WHERE GeometryType(the_geom) = 'GEOMETRYCOLLECTION'
  
 
Problem mit Datenfelder (for < 8.3):
 
Problem mit Datenfelder (for < 8.3):
  SELECT the_geom ... (end_date - NOW() >= 10) AS GREEN
 
    AND (end_date - NOW() < 0) AS RED ... FROM ...
 
Casting changes in >= 8.3:
 
  SELECT the_geom ... (end_date - NOW() >= '10 DAYS'::INTERVAL) AS GREEN
 
    AND (end_date - NOW() < '0'::INTERVAL) AS RED ... FROM ...
 
 
  SELECT the_geom ...
 
 
     EXTRACT(DAY FROM (end_date - NOW()) >= 10) AS GREEN
 
     EXTRACT(DAY FROM (end_date - NOW()) >= 10) AS GREEN
 
     AND EXTRACT(DAY FROM (end_date - NOW()) < 0) AS RED ... FROM ...
 
     AND EXTRACT(DAY FROM (end_date - NOW()) < 0) AS RED ... FROM ...
 +
 +
 +
Alle Tabellen und Indexes (danalog \di):
 +
  SELECT
 +
    n.nspname AS "Schema",
 +
    c.relname AS "Name",
 +
    CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END AS "Type",
 +
    u.usename AS "Owner", c2.relname AS "Table"
 +
  FROM pg_catalog.pg_class c
 +
    JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
 +
    JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
 +
    LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner
 +
    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 +
  WHERE c.relkind IN ('i','')
 +
  AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND pg_catalog.pg_table_is_visible(c.oid)
 +
  ORDER BY 1,2
  
 
== hstore ==
 
== hstore ==
Zeile 107: Zeile 176:
 
Siehe auch:  
 
Siehe auch:  
 
* PostgreSQL docs.
 
* PostgreSQL docs.
* Seminar thesis from M. Ott about ''Key/Value Pair Versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL'' in [http://wiki.hsr.ch/Datenbanken/SeminarDatenbanksystemeFS11]
+
* Seminar thesis from M. Ott about ''"Key/Value Pair Versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL"'' in [http://wiki.hsr.ch/Datenbanken/SeminarDatenbanksystemeFS11]
 
* Beispiele: [[PostGIS Terminal]].
 
* Beispiele: [[PostGIS Terminal]].
  
 
== Binary Large Objects (LOB, BLOB) ==
 
== Binary Large Objects (LOB, BLOB) ==
  
In PostgreSQL there exist three ways to manage Binary Large Objects:  
+
Siehe [[PostgreSQL - Binary Large Objects]].
# Data type BYTEA
+
 
# LO facility 'pg_largeobject' (a system table)
+
== Data Warehousing mit PostgreSQL ==
# Data type DATALINK (experimental)
+
 
 +
Stichworte: PostgreSQL-Module cube, PostgreSQL-Module tablefunc (mit crosstab).
 +
 
 +
tbd.
 +
 
 +
== Konfiguration ==
 +
 
 +
Die pg_hba.conf kann so eingestellt werden, dass bei der Ausführung von PostgreSQL-Kommandozeilen-Tools kein Passwort übertragen werden muss.
 +
 
 +
Bsp mit 127.0.0.1: Anstelle
 +
  # IPv4 local connections:
 +
  host    all        all        127.0.0.1/32          md5
 +
neu:
 +
  # IPv4 local connections:
 +
  host    all        all        127.0.0.1/32          trust
 +
 
 +
== Backup und Restore ==
 +
 
 +
Dump whole PostgreSQL cluster ('instance') with pg_dumpall. This includes the globals--information about users and groups, tablespaces, and so on. Outputs are just text files.
 +
  # pg_dumpall -U postgres -h localhost -p 5432 --clean --file=dump.sql
  
PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page (typically 8 KB) into a secondary storage area per table. This defines the size limit of of any column/field to 1 GB.  
+
If going to backup a single database use pg_dump including globals:
 +
  # pg_dump -U postgres -h localhost -p 5432 --clean --file=sandbox.sql sandbox
 +
  # pg_dumpall -U postgres -h localhost -p 5432 --clean --globals-only --file=globals.sql
 +
 
 +
To restore, load the globals first, then the database dump:
 +
  # psql -U postgres -h localhost -p 5433 < globals.sql
 +
  # psql -U postgres -h localhost -p 5433 < sandbox.sql
 +
 
 +
Here's another example:
 +
# Backup full database
 +
pg_dump -h host -U user -W database > backup.sql
 +
 +
# Restore full database
 +
psql -d database -f backup.sql
 +
 
 +
# Backup database schema
 +
pg_dump -sv prueba -O > backup.schema.sql
 +
 
 +
# Backup database data
 +
pg_dump -Fc -f backup.data.dump -a --disable-triggers database
 +
 
 +
Tipps:
 +
* Dont't store data in schema 'public' for several reasons. One is, that until PG 8, all db additional objects (functions, triggers) are maintainted in schema public.
 +
* Eventually switch off Privileges. See also Tab „Dump Options #1“ > Don’t Save Privilege in pgAdmin3.
 +
 
 +
Doku.:
 +
* 'PostgreSQL 9.0 pg_dump, pg_dumpall, pg_restore cheatsheet' from postgresonline.com.
 +
* [http://postgis.refractions.net/docs/ch02.html#hard_upgrade PostGIS Hard Upgrade]
 +
* [http://www.postgresql.org/docs/9.1/static/backup.html PostgreSQL Chapter 24. Backup and Restore]
 +
 
 +
== Weitere Tipps... ==
  
=== BYTEA ===
+
=== Data Generator ===
 +
 +
Schritt 1: Demo / Tests:
 +
  select
 +
    g                            as no,
 +
    random()                    as rand_01_float,
 +
    (random() * 2147483647)::int as rand_uint,
 +
    trunc(random() * 6 + 1)      as range_1_6_int, -- >=1 and <=6
 +
    md5(random()::text)          as char32
 +
  from generate_series(1,20) g
 +
  order by 4;
  
The BYTEA data type allows storage of binary strings.  
+
Generate series directly with dates. No need to use ints or timestamps:
* It stores a LOB within the table, respectively using TOAST.
+
  select
* It is thus limited to 1 GB
+
    datetime::date -- only date part of datetime
* The storage is octal and allows non printable characters (in contrast to character strings which don't, but allow locale settings).
+
  from generate_series('2018-03-01'::date,'2018-03-05'::date,'1 day'::interval) datetime;
* The input/output format is HEX (as of PostgreSQL 9.0).
 
  
Notes:
+
Step 2: Generate table which fits' to schema:
* BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different.
+
  create table foo ( ... )  
* BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).
 
  
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/datatype-binary.html Binary Data Types].
+
Generate data and insert it into table:
 +
  insert into foo (id, remarks)
 +
    values (generate_series(1,3), 'b '||md5(random()::text));
  
=== Large objects (LO) ===
+
  select * from foo order by 1;
  
Large objects (LO) can also placed in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID.
+
=== Convert text to number ===
* There is a read/write API which offers client (= modules written in C) and server-side (= SQL) functions.
 
* The main related SQL functions are: lo_creat(), lo_create(), lo_unlink(), lo_import(), and lo_export(). lo_import() and lo_export() need permissions of the database's owning user (i.e. superuser).
 
* LO are broken into "chunks" and stored in btree-indexed rows.
 
* LO allows values up to 2 GB in size, whereas TOASTed fields (like BYTEA) can be at most 1 GB.
 
* LO entries can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST (and e.g BYTEA).
 
  
Note:  
+
See [http://stackoverflow.com/questions/2082686/how-do-i-cast-a-string-to-integer-and-have-0-in-case-of-error-in-the-cast-with-p]
* When PostgreSQL doc. mentions 'lo' (LO = Large Object) it typically refers to this facility.
 
* In contrast to e.g. BYTEA - LO is not a data type on its own but a table, a 'facility'.
 
  
IMPORTANT NOTE when using JDBC BLOB (or @Lob annotation in Hibernate):
+
=== DOMAIN, ENUM or Lookup-Table? ===
* Since PostgreSQL considers a LO entry as an object on it's own, pg_largeobjects grows infinitely unless a separate cleaning is made.
 
* To prevent this, typically a trigger needs to be added which deletes entries in pg_largeobject as descriebd in module 'lo'.
 
* See additional PostgreSQL modules [http://www.postgresql.org/docs/9.1/static/lo.html 'lo'] and [http://www.postgresql.org/docs/9.1/static/vacuumlo.html 'vaccumlo'] in the PostgreSQL docs.
 
  
See PostgreSQL doc [http://www.postgresql.org/docs/9.1/static/largeobjects.html 'Large Objects'] and JDBC data type BLOB: [http://jdbc.postgresql.org/documentation/head/binary-data.html].
+
DOMAIN is ANSI SQL 92, ENUM is PostgreSQL specific. If compatibility is no issue then I'd prefer ENUM.
  
Example: Typical usage of the server-side (= SQL) functions (from the Postgres docs):
+
See also [http://www.commandprompt.com/blogs/joshua_drake/2009/01/fk_check_enum_or_domain_that_is_the_question/ Blog about "FK, CHECK, ENUM or DOMAIN. That is the question."] by Joshua Drake, Jan 21st, 2009.
  
  CREATE TABLE image (
+
=== DATE or TIMESTAMP? ===
    name            text,
+
DATE is year, month and day, whereas TIMESTAMP contains minutes, hours and seconds - and (optionally) timezone. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). See http://wiki.postgresql.org/wiki/Working_with_Dates_and_Times_in_PostgreSQL
    raster          oid
 
  );
 
 
 
  SELECT lo_creat(-1);      -- returns OID of new, empty large object.
 
 
 
  SELECT lo_create(43213);  -- attempts to create large object with OID 43213.
 
 
 
  SELECT lo_unlink(173454);  -- deletes large object with OID 173454.
 
 
 
  INSERT INTO image (name, raster)
 
    VALUES ('beautiful image', lo_import('/etc/motd'));
 
 
 
  INSERT INTO image (name, raster)  -- same as above, but specify OID to use.
 
    VALUES ('beautiful image', lo_import('/etc/motd', 68583));
 
 
 
  SELECT lo_export(image.raster, '/tmp/motd') FROM image  -- need superuser permission.
 
    WHERE name = 'beautiful image';
 
  
=== DATALINK ===
+
What is the best way to insert data?
  
The DATALINK type maintains a link to a specific file in external storage.
+
  INSERT INTO sometable VALUES (13, 'test', 13, '2012-12-02');
 +
  INSERT INTO sometable VALUES (14, 'test', 14, '2012-12-02'::timestamp);
 +
  INSERT INTO sometable VALUES (15, 'test', 15, timestamp '2012-12-02');
  
Example:
+
First line does not check datetype, so I'd prefer one of the other two.
  CREATE TABLE mydata (
 
    myid Integer,  
 
    myimage DATALINK(40)
 
    FILE LINK CONTROL INTEGRITY ALL
 
  );
 
  
Weblinks:
+
=== SQL MERGE Command ===
* [http://wiki.postgresql.org/wiki/DATALINK DATALINK wiki page]
 
* MSE-Seminar Thesis (2011) from Florian Schwendener about ''SQL/MED and More - Management of External Data in PostgreSQL and Microsoft SQL Server'': [http://wiki.hsr.ch/Datenbanken/SeminarDatenbanksystemeHS11]
 
  
== Data Warehousing mit PostgreSQL ==
+
The [http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE PostgreSQL documentation] provides a solution for an MERGE/INSERT function. 'UPSERT' is the PostgreSQL slang for ANSI SQL 'Merge' and means UPDATE-SELECT. But this solution proposal has concurrency issues which have been avoided by this proposal (by Depesz's Blog [http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/]). See also http://wiki.postgresql.org/wiki/SQL_MERGE
  
Stichworte: PostgreSQL-Module cube, PostgreSQL-Module tablefunc (mit crosstab).
+
PL/pgSQL-Funktion für ein MERGE/UPSERT mit einem Attribut ('in_whatever'):
  
tbd.  
+
  -- All function parameters and all the update and insert
 +
  -- statements in the body need to be adapted for specific tables
 +
  CREATE OR REPLACE FUNCTION merge_db( key INT, data TEXT )
 +
  RETURNS void AS $$
 +
  BEGIN
 +
      -- first try to update with primary key as id
 +
      UPDATE db SET b=data WHERE a = key;
 +
      --UPDATE db SET b=data,etc. WHERE a = key;
 +
      IF FOUND THEN
 +
          RETURN;
 +
      END IF;
 +
      -- not there, so try to insert the key
 +
      -- if someone else inserts the same key
 +
      BEGIN
 +
          INSERT INTO db (a,b) VALUES (key, data);
 +
          --INSERT INTO db (a,b,etc.) VALUES (a,b,etc.);
 +
      EXCEPTION WHEN OTHERS THEN
 +
          -- try another time
 +
          UPDATE db SET b=data WHERE a = key;
 +
      END;
 +
      RETURN;
 +
  END;
 +
  $$ LANGUAGE plpgsql;
  
== Konfiguration ==
+
Unfortunately, there's no other generic function around which would accept any table structure.
  
Die pg_hba.conf kann so eingestellt werden, dass bei der Ausführung von PostgreSQL-Kommandozeilen-Tools kein Passwort übertragen werden muss.
+
=== Triggers ===
  
Bsp mit 127.0.0.1: Anstelle
+
Tipps:
  # IPv4 local connections:
+
* Prefer triggers over rules (since 9.0).
  host    all        all        127.0.0.1/32          md5
+
* ...
neu:
 
  # IPv4 local connections:
 
  host    all        all        127.0.0.1/32          trust
 
  
== Weitere Tipps... ==
+
'''Q''': Is there a restriction that an insert trigger can't modify own or other rows of the table that fired it?<br/>
 +
A: There's no such restriction. The best way to change the inserted row is to do it before the insertion, though. Trying to modify the inserted row can have slightly strange effects.
  
* Code snippets auf Postgres Wiki
+
=== Full Text Search ===
  
 +
Siehe [[PostgreSQL FTS - Tipps und Tricks]]
  
 
[[Kategorie:PostgreSQL]]
 
[[Kategorie:PostgreSQL]]

Aktuelle Version vom 9. November 2020, 10:34 Uhr

Siehe auch:


Stefan's Notizen...

Passwörter

  • Setzen/Ändern Passwort für eine Datenbank:
 % psql -d unsere_db -c "ALTER USER unsere_db WITH PASSWORD 'neues_passwort';"
  • TIPP: PG Programme haben nicht gern, wenn das PW fehlt!
  • TIPP: Das Setzen des Passworts für Super User postgres auf PW postgres verlangt template1 und nicht postgres:
 % psql -d template1 -c "ALTER USER postgres WITH PASSWORD 'postgres';"
  • psql und wohl alle weiteren PostgreSQL-Tools (dropdb, createdb, pg_dump, etc.) u.a. brauchen Angaben zu Benutzer und Passwort. Der Benutzer wird mit Parameter -U mitgegeben, das Passwort hingegen kann (mit Absicht) nicht mitgegeben werden. Es gibt aber die Möglichkeit, Umgebungsvariablen zu setzen:
 SET PGUSER=postgres
 SET PGPASSWORD=<meinpasswort>
  • Start pgsl-Client mit Umlauten (Windows):
 > cmd /c chcp 1252
 > psql -d gisdb -U postgres
 postgres=# SET CLIENT_ENCODING TO 'LATIN1';
 postgres=# <do whatever you must do in psql...>
 > cmd /c chcp 850

Create "Read-only User"

Generate 'SELECT' grant commands from a catalog query:

 SELECT 'GRANT SELECT ON ' || relname || ' TO xxx;'
 FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
 WHERE nspname = 'public' AND relkind IN ('r', 'v')

Good to know:

  • If you only grant CONNECT to a database, the user can connect but has no other privileges.
  • PG makes a distinction between GRANT SELECT and GRANT USAGE for security reasons (poking metadata about tables).
  • PG knows a schema PUBLIC and also a 'default role' (which define 'DEFAULT PRIVILEGES') also called PUBLIC.
  • PG sets (and 'inherits') security 'flags' at the time of the commands (e.g. GRANT). This means, 1. when a 'parent-flag' changes this change is not propagated ('inherited') and 2. A "GRANT SELECT ON ALL TABLES IN SCHEMA public TO ourtable;" only affects tables that have already been created.
  • Since PostgreSQL 9.0 you still need to grant USAGE permissions on schemas, but you can grant permissions on all tables, views etc. in the schema using a single command rather than having to generate them:

Read-Only user (with variants):

 # Prevent default users from creating tables
 REVOKE CREATE ON SCHEMA public FROM public;
 REVOKE USAGE  ON SCHEMA public FROM public;
 
 # Create read-only user
 CREATE ROLE readonlyuser WITH LOGIN ENCRYPTED PASSWORD 'strongpassword';
 GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonlyuser;
 GRANT USAGE ON SCHEMA public TO readonlyuser; -- omit USAGE if needed
 
 # Restore USAGE/CREATE to default user
 GRANT USAGE  ON SCHEMA public TO public;
 GRANT CREATE ON SCHEMA public TO public;

(Source: note that not all answers are uptodate or correct: https://stackoverflow.com/questions/760210/how-do-you-create-a-read-only-user-in-postgresql )

Daten laden/importieren

Siehe http://wiki.postgresql.org/wiki/COPY

COPY Syntax:

  • Tab-Delimited

 COPY geometry_columns (f_table_catalog, f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, type) FROM stdin;
 catalog	public	planet_osm_point	way	2	900913	POINT
 catalog	public	planet_osm_line	way	2	900913	LINESTRING
 \.

Foreign Data Wrappers

Ab Version 9.1 werden Foreign Data Wrappers (FDW) und ein Teil des SQL/MED-Standards unterstützt. file_fdw ist eine offizielle Extension (contrib module): [1].

Siehe auch PostgreSQL Wiki:

  • Foreign_data_wrappers [2]
  • SQL/MED [3]

Linking open and remote (PostgreSQL) read-only databases easily! Uses postgres_fdw the SQL/MED implementation (Foreign Data Wrapper).

Caveats:

  • Must be superuser to create extension postgres_fdw.
  • Joins are slow or even never returning.

Future Work:

  • Create a stored procedure link_pg_server(...).
  • Promote sharing of endpoints like SPARQL.

Tested with PostgreSQL 11. Credits: https://robots.thoughtbot.com/postgres-foreign-data-wrapper

 -- Step 1: Preparation
 CREATE EXTENSION IF NOT EXISTS postgres_fdw;
 
 -- Connect to remote DB server:
 DROP SERVER IF EXISTS gis_db_server CASCADE;
 CREATE SERVER gis_db_server
   FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host 'xxx', port 'xxx', dbname 'xxx');
 CREATE USER MAPPING FOR CURRENT_USER
   SERVER gis_db_server
   OPTIONS (user 'xxx', password 'xxx');
 
 -- Step 2: Import Schema
 DROP SCHEMA IF EXISTS gis_db_public CASCADE;
 CREATE SCHEMA gis_db_public;
 CREATE EXTENSION IF NOT EXISTS postgis;
 CREATE EXTENSION IF NOT EXISTS hstore; -- needed by gis_db schema
 IMPORT FOREIGN SCHEMA public
   FROM SERVER gis_db_server
   INTO gis_db_public;
 
 -- Step 3: Go!
 SELECT COUNT(*) FROM gis_db_public.osm_point;
 -- Slow - or even never returning! No operator pushdown ??? 
 -- (no "join op nor tmp table sent to linked db) 
 -- See EXPLAIN:  https://explain.depesz.com/s/PStr 
 WITH tmp(osm_id, name) AS (
  VALUES (4066310784, 'Tierpark Chur'), (1398864548, 'Tierpark Aletsch')
 )
 SELECT point.* 
 FROM gis_db_public.osm_point AS point
 JOIN tmp ON tmp.osm_id=point.osm_id;

PostgreSQL SQL

Siehe auch im Wiki der Vorlesung Datenbanken (Dbs1) von Prof. S. Keller.

Version:

 select version();

Finding duplicate values in one or more columns:

SELECT * 
FROM (
  SELECT 
    row_number() OVER (PARTITION BY email),
    name,
    email
  FROM dedup) t 
WHERE t.row_number < 2;

Anfrage des Geometrie-Typs ('the_geom' ist der Name des Geometrie-Felds):

 SELECT *
 FROM sometable
 WHERE GeometryType(the_geom) = 'GEOMETRYCOLLECTION'

Problem mit Datenfelder (for < 8.3):

   EXTRACT(DAY FROM (end_date - NOW()) >= 10) AS GREEN
   AND EXTRACT(DAY FROM (end_date - NOW()) < 0) AS RED ... FROM ...


Alle Tabellen und Indexes (danalog \di):

 SELECT 
   n.nspname AS "Schema", 
   c.relname AS "Name", 
   CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END AS "Type", 
   u.usename AS "Owner", c2.relname AS "Table"
 FROM pg_catalog.pg_class c
   JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid 
   JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid 
   LEFT JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner 
   LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 WHERE c.relkind IN ('i',)
 AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND pg_catalog.pg_table_is_visible(c.oid)
 ORDER BY 1,2

hstore

Stichworte: Key-Value-Attributes, Hashtable, nosql-Database.

Siehe auch:

  • PostgreSQL docs.
  • Seminar thesis from M. Ott about "Key/Value Pair Versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL" in [4]
  • Beispiele: PostGIS Terminal.

Binary Large Objects (LOB, BLOB)

Siehe PostgreSQL - Binary Large Objects.

Data Warehousing mit PostgreSQL

Stichworte: PostgreSQL-Module cube, PostgreSQL-Module tablefunc (mit crosstab).

tbd.

Konfiguration

Die pg_hba.conf kann so eingestellt werden, dass bei der Ausführung von PostgreSQL-Kommandozeilen-Tools kein Passwort übertragen werden muss.

Bsp mit 127.0.0.1: Anstelle

 # IPv4 local connections:
 host    all         all         127.0.0.1/32          md5

neu:

 # IPv4 local connections:
 host    all         all         127.0.0.1/32          trust

Backup und Restore

Dump whole PostgreSQL cluster ('instance') with pg_dumpall. This includes the globals--information about users and groups, tablespaces, and so on. Outputs are just text files.

 # pg_dumpall -U postgres -h localhost -p 5432 --clean --file=dump.sql

If going to backup a single database use pg_dump including globals:

 # pg_dump -U postgres -h localhost -p 5432 --clean --file=sandbox.sql sandbox
 # pg_dumpall -U postgres -h localhost -p 5432 --clean --globals-only --file=globals.sql

To restore, load the globals first, then the database dump:

 # psql -U postgres -h localhost -p 5433 < globals.sql
 # psql -U postgres -h localhost -p 5433 < sandbox.sql

Here's another example:

# Backup full database
pg_dump -h host -U user -W database > backup.sql

# Restore full database
psql -d database -f backup.sql
# Backup database schema
pg_dump -sv prueba -O > backup.schema.sql
# Backup database data
pg_dump -Fc -f backup.data.dump -a --disable-triggers database

Tipps:

  • Dont't store data in schema 'public' for several reasons. One is, that until PG 8, all db additional objects (functions, triggers) are maintainted in schema public.
  • Eventually switch off Privileges. See also Tab „Dump Options #1“ > Don’t Save Privilege in pgAdmin3.

Doku.:

Weitere Tipps...

Data Generator

Schritt 1: Demo / Tests:

 select 
   g                            as no,
   random()                     as rand_01_float,
   (random() * 2147483647)::int as rand_uint,
   trunc(random() * 6 + 1)      as range_1_6_int, -- >=1 and <=6
   md5(random()::text)          as char32
 from generate_series(1,20) g
 order by 4;

Generate series directly with dates. No need to use ints or timestamps:

 select 
   datetime::date -- only date part of datetime
 from generate_series('2018-03-01'::date,'2018-03-05'::date,'1 day'::interval) datetime;

Step 2: Generate table which fits' to schema:

 create table foo ( ... ) 

Generate data and insert it into table:

 insert into foo (id, remarks) 
   values (generate_series(1,3), 'b '||md5(random()::text));
 select * from foo order by 1;

Convert text to number

See [5]

DOMAIN, ENUM or Lookup-Table?

DOMAIN is ANSI SQL 92, ENUM is PostgreSQL specific. If compatibility is no issue then I'd prefer ENUM.

See also Blog about "FK, CHECK, ENUM or DOMAIN. That is the question." by Joshua Drake, Jan 21st, 2009.

DATE or TIMESTAMP?

DATE is year, month and day, whereas TIMESTAMP contains minutes, hours and seconds - and (optionally) timezone. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). See http://wiki.postgresql.org/wiki/Working_with_Dates_and_Times_in_PostgreSQL

What is the best way to insert data?

 INSERT INTO sometable VALUES (13, 'test', 13, '2012-12-02');
 INSERT INTO sometable VALUES (14, 'test', 14, '2012-12-02'::timestamp);
 INSERT INTO sometable VALUES (15, 'test', 15, timestamp '2012-12-02');

First line does not check datetype, so I'd prefer one of the other two.

SQL MERGE Command

The PostgreSQL documentation provides a solution for an MERGE/INSERT function. 'UPSERT' is the PostgreSQL slang for ANSI SQL 'Merge' and means UPDATE-SELECT. But this solution proposal has concurrency issues which have been avoided by this proposal (by Depesz's Blog [6]). See also http://wiki.postgresql.org/wiki/SQL_MERGE

PL/pgSQL-Funktion für ein MERGE/UPSERT mit einem Attribut ('in_whatever'):

 -- All function parameters and all the update and insert 
 -- statements in the body need to be adapted for specific tables
 CREATE OR REPLACE FUNCTION merge_db( key INT, data TEXT ) 
 RETURNS void AS $$
 BEGIN
     -- first try to update with primary key as id
     UPDATE db SET b=data WHERE a = key;
     --UPDATE db SET b=data,etc. WHERE a = key;
     IF FOUND THEN
         RETURN;
     END IF;
     -- not there, so try to insert the key
     -- if someone else inserts the same key
     BEGIN
         INSERT INTO db (a,b) VALUES (key, data);
         --INSERT INTO db (a,b,etc.) VALUES (a,b,etc.);
     EXCEPTION WHEN OTHERS THEN
         -- try another time
         UPDATE db SET b=data WHERE a = key;
     END;
     RETURN;
 END;
 $$ LANGUAGE plpgsql;

Unfortunately, there's no other generic function around which would accept any table structure.

Triggers

Tipps:

  • Prefer triggers over rules (since 9.0).
  • ...

Q: Is there a restriction that an insert trigger can't modify own or other rows of the table that fired it?
A: There's no such restriction. The best way to change the inserted row is to do it before the insertion, though. Trying to modify the inserted row can have slightly strange effects.

Full Text Search

Siehe PostgreSQL FTS - Tipps und Tricks