PostgreSQL - Tipps und Tricks: Unterschied zwischen den Versionen

Aus Geoinformation HSR
Wechseln zu: Navigation, Suche
(PostgreSQL-Optimierung)
Zeile 113: Zeile 113:
 
   # IPv4 local connections:
 
   # IPv4 local connections:
 
   host    all        all        127.0.0.1/32          trust
 
   host    all        all        127.0.0.1/32          trust
 
== PostgreSQL-Optimierung==
 
 
PostgreSQL Performance Optimierung und Tuning.
 
 
Siehe auch:
 
* [http://wiki.postgresql.org/wiki/Performance_Optimization PostgreSQL Performance Optimization] on Wiki
 
* [http://www.revsys.com/writings/postgresql-performance.html RevSys (Hardware, Tuning, Explain, etc.)], [http://groups.google.ch/group/EtoE/browse_thread/thread/eaabeda775e65260/1d3c809e723c6b07?hl=de&lnk=st&q=Best+CPU+for+PostgreSQL#1d3c809e723c6b07 PostgreSQL 8 Performance Checklist (genaue Erklärung der Variablen in postgresql.conf], [http://www.pcguide.com/ref/hdd/perf/raid/ RAID Hinweise]. Für kleinere bis mittlere PostgreSQL Installationen mit primär Leseoperationen empfiehlt sich RAID 0  oder RAID 1. RAID 0 hat jedoch keine Redundanz. RAID 5 hat schlechte Performanz, ausser wenn 6 oder mehr Disks. SCSI Ultra 320 Disks sind zwar teurer aber immer noch wesentlich schneller als S-ATA/IDE Disks.
 
* Das Tool pgtune macht Verbesserungsvorschläge der Konfiguration.
 
* CLUSTER und FILLINDEX: http://www.postgresonline.com/journal/index.php?/archives/10-How-does-CLUSTER-ON-improve-index-performance.html
 
* [[http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm 5-Minute Introduction to PostgreSQL Performance]] by Greg Smith
 
 
=== Tipps ===
 
 
Bevor man beginnt und postgresql.conf etc. verändert...
 
* ANALYSE und CLUSTER-Befehl ausführen: Siehe [http://wiki.postgresql.org/wiki/VACUUM_FULL#What_to_use_instead].
 
* Den aktuellen Zustand testen mit dem Tool [http://wiki.postgresql.org/wiki/Regression_Testing_with_pgbench pgbench].
 
 
Tipps zur Optimierung (Optimization), Performance, Tuning von PostgreSQL...:
 
# Make sure you have (spatial) indexes on your tables.
 
# Cluster on (spatial) indexes tends to improve performance but mostly good for static spatial data.
 
# PostGIS: A spatial index is more effective on small geometries than larger geometries even for MySQL.
 
# EXPLAIN ANALYZE is your friend -- especially PgAdmin III Graphical Explain - will pinpoint when indexes are not being used and what navigation strategies are being used.
 
# posgresql.conf - fiddling with that a lot has been written -- particularly work_mem is important.
 
# Then there is a whole science on how you write sql queries. In general for PostgreSQL - you can force PostgreSQL to materialize subselects by putting in an ORDER BY and in general a subselect tends to force the planner to evaluate those as a single unit of work, which tends to improve speed in many cases.
 
 
SQL:
 
* Postgres kennt "Partial index": Beispiel: "create index i on table2(field2) where field2 <> 0;" Optimiert z.B. folgende Query: select * from table2 where field2<>0; (vgl. [http://www.postgresql.org/docs/8.4/static/indexes-partial.html Doc.]).
 
 
=== Tuning durch Konfiguration von postgresql.conf ===
 
 
Konfiguration von postgresql.conf (siehe z.B. [http://www.varlena.com/GeneralBits/Tidbits/perf.html]) durch Parameter heraufsetzen:
 
* Speziell bei genügend Memory und wenigen Applikationen und Benutzern:
 
** effective_cache_size (default 128MB )
 
** work_mem – default 1MB is wayyy small
 
** maintenance_work_mem – default 16MB small
 
** shared_buffers (default 24MB): This is a real killer... bump it to 1-2GB, at least, on a server w/ >4G RAM, up to 8GB (don't go above that w/o good testing..).
 
* sort_memory - speziell bei order-by oder group-by Klauseln
 
* random_page_cost – speziell bei schnellen Disks wie SCSI und RAID
 
 
Auto Vacuum einschalten.
 
 
=== Read-Only Databases ===
 
 
Steps in order to secure and speedup such PostgreSQL/PostGIS instance:
 
 
1. Re-configure PostgreSQL server parameters (postgresql.conf) as following:
 
 
fsync=off
 
synchronous_commit=off
 
full_page_writes=off
 
default_transaction_read_only
 
 
Greg Smith wrote (19. April 2011 16:30 on pgsql-performance): "The only other thing you probably want to do is set checkpoint_segments to a big number.  Shouldn't matter normally, but when doing this (vacuum) freeze operation it will help that execute quickly.  You want a lower maintenance_work_mem on a read-only system than the master too, possibly a higher shared_buffers as well.  It's all pretty subtle beyond the big parameters you already identified."
 
 
  checkpoint_segments=?
 
  maintenance_work_mem=?
 
  shared_buffers=?
 
 
2. Restart server, login as dbadmin, create database, create an app.-user.
 
 
3. Load dataset...:
 
 
3.1. with owner 'app.-user' (in schema PUBLIC or other?).
 
 
3.2. create indexes and cluster:
 
 
CREATE INDEX mytable_idx
 
ON mytable
 
-- USING btree (upper(last_name), upper(first_name)) -- tbd.
 
WITH (FILLFACTOR=100);
 
 
ALTER TABLE member CLUSTER ON member_name_idx;
 
 
3.3. Config database:
 
 
ALTER DATABASE x SET default_transaction_read_only = on;
 
 
3.4. Issue a VACUUM command:
 
 
VACUUM FREEZE ANALYZE;
 
 
4. Create a 'read-only' user (login role) with only read access to user defined tables:
 
GRANT SELECT ... TO read_only_user
 
...
 
 
5. Optimize and secure session by following parameters:
 
 
SET STATEMENT_TIMEOUT TO 60000;
 
SET transaction_read_only TO TRUE; --?
 
SET TRANSACTION READ ONLY;
 
 
6. Go to step 3 in case of new data or a harddisk crash.
 
  
 
== Weitere Tipps... ==
 
== Weitere Tipps... ==

Version vom 16. Oktober 2011, 20:39 Uhr

Siehe auch:


Meine unsortierten Notizen...

Tools und Konfiguration

  • psql und wohl alle weiteren PostgreSQL-Tools (dropdb, createdb, pg_dump, etc.) u.a. brauchen Angaben zu Benutzer und Passwort. Der Benutzer wird mit Parameter -U mitgegeben, das Passwort hingegen kann (mit Absicht) nicht mitgegeben werden. Es gibt aber die Möglichkeit, Umgebungsvariablen zu setzen:
 SET PGUSER=postgres
 SET PGPASSWORD=<meinpasswort>
  • Start pgsl-Client:
 > psql -h localhost -p 5432 template1 "postgres"
 oder
 > cmd /c chcp 1252
 > psql -d gisdb -U postgres
 postgres=# SET CLIENT_ENCODING TO 'LATIN1';
 postgres=# <do whatever you must do in psql...>
 > cmd /c chcp 850

Create "Read-only User"

From: [1]

If you only grant CONNECT to a database, the user can connect but has no other privileges. In PostgreSQL 8.x you have to grant USAGE on namespaces (schemas) and SELECT on tables and views individually. So something like:

 GRANT CONNECT ON DATABASE mydb TO xxx;
 -- This assumes you're actually connected to mydb..
 GRANT USAGE ON SCHEMA public TO xxx;
 GRANT SELECT ON mytable TO xxx;

Granting 'SELECT' to each table/view individually can be a bore. Something like this can help:

 SELECT 'GRANT SELECT ON ' || relname || ' TO xxx;'
 FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
 WHERE nspname = 'public' AND relkind IN ('r', 'v')

This should output the relevant GRANT commands to GRANT SELECT on all tables and views in public, for copy-n-paste.

PostgreSQL 9:

In PostgreSQL 9.0 its easier. You still need to grant USAGE permissions on schemas, but you can grant permissions on all tables, views etc. in the schema using a single command rather than having to generate them:

 GRANT SELECT ON ALL TABLES IN SCHEMA public TO xxx;

This only affects tables that have already been created, so it's the equivalent of the generated commands above. More powerfully, you can automatically have default roles assigned to new objects in future:

 ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO osmgis;

Note that by default this will only affect objects (tables) created by the user that issued this command: although it can also be set on any role that the issuing user is a member of. However, you don't pick up default privileges for all roles you're a member of when creating new objects... so there's still some faffing around. If you adopt the approach that a database has an owning role, and schema changes are performed as that owning role, then you should assign default privileges to that owning role. IMHO this is all a bit confusing and you may need to experiment to come up with a functional workflow.


Daten laden/importieren

Siehe http://wiki.postgresql.org/wiki/COPY

COPY Syntax:

  • Tab-Delimited

 COPY geometry_columns (f_table_catalog, f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, type) FROM stdin;
 catalog	public	planet_osm_point	way	2	900913	POINT
 catalog	public	planet_osm_line	way	2	900913	LINESTRING
 \.

PostgreSQL SQL

Siehe auch im Wiki der Vorlesung Datenbanken (Dbs1) von Prof. S. Keller.

Version:

 select version();

Anfrage des Geometrie-Typs ('the_geom' ist der Name des Geometrie-Felds):

 SELECT *
   FROM sometable
   WHERE GeometryType(the_geom) = 'GEOMETRYCOLLECTION'

Problem mit Datenfelder (for < 8.3):

 SELECT the_geom ... (end_date - NOW() >= 10) AS GREEN 
   AND (end_date - NOW() < 0) AS RED ... FROM ...

Casting changes in >= 8.3:

 SELECT the_geom ... (end_date - NOW() >= '10 DAYS'::INTERVAL) AS GREEN 
   AND (end_date - NOW() < '0'::INTERVAL) AS RED ... FROM ...
 SELECT the_geom ...
   EXTRACT(DAY FROM (end_date - NOW()) >= 10) AS GREEN
   AND EXTRACT(DAY FROM (end_date - NOW()) < 0) AS RED ... FROM ...

hstore

Stichworte: Key-Value-Attributes, Hashtable, nosql-Database.

Siehe auch PostGIS Terminal.

Data Warehousing mit PostgreSQL

Stichworte: PostgreSQL-Module cube, PostgreSQL-Module tablefunc (mit crosstab).

tbd.

Konfiguration

Die pg_hba.conf kann so eingestellt werden, dass bei der Ausführung von PostgreSQL-Kommandozeilen-Tools kein Passwort übertragen werden muss.

Bsp mit 127.0.0.1: Anstelle

 # IPv4 local connections:
 host    all         all         127.0.0.1/32          md5

neu:

 # IPv4 local connections:
 host    all         all         127.0.0.1/32          trust

Weitere Tipps...

  • Code snippets auf Postgres Wiki