GeoCSV: Unterschied zwischen den Versionen

Aus Geoinformation HSR
Wechseln zu: Navigation, Suche
K (CSVT file format specification)
K (CSV file format specification)
 
(40 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
 
Specification of the tabular file format [[CSV]] (Comma Separated Values) with an optional geometry extension!  
 
Specification of the tabular file format [[CSV]] (Comma Separated Values) with an optional geometry extension!  
  
   >> DRAFT Version 0.1 <<
+
   >> DRAFT Version 0.3 - Date of last modification: ''see bottom'' <<
  
Date of last modification: ''see bottom'', Author: [[Stefan]]. ''(For notes and discussion see [[Diskussion:GeoCSV]])''.
+
Author: [[Stefan]] and contributors. ''(For notes and discussion see [[Diskussion:GeoCSV]])''.
  
 
=== Introduction ===
 
=== Introduction ===
Zeile 14: Zeile 14:
  
 
This format has following drawbacks:
 
This format has following drawbacks:
 +
* not suited for massive datasets (except when compressing/zipping)
 +
* only one layer per file
 
* no layer name - except for the file name (which can be changed easily by others...).
 
* no layer name - except for the file name (which can be changed easily by others...).
* auxiliary cluttered accompanying files, like .csvt and .prj
+
* many other drawbacks like auxiliary cluttered accompanying files, like .csvt and .prj which it shares with [[Shapefile]]s.
  
 
=== GeoCSV file format specification ===
 
=== GeoCSV file format specification ===
  
GeoCSV is based on the CSV specification (see following section) and comes with two variants: Option Easting/Northing and Option WKT.
+
GeoCSV is based on the CSV specification (see following section) and comes with two variants: Option Point(X/Y) and Option WKT.
  
Option "Easting/Northing" (longitude/latitude, lon/lat, long/lat, x/y like in mathematics):
+
Option "WKT" (preferred):
* Geometry Point type as two neighboring columns (in either order) of type Integer or Float: one containing the easting coordinate, and another containing northing coordinate separated by a comma.
+
* It's one single column of type String containing a constructor, like for example: "POINT (8.8249 47.2274)", meaning 8.8249 east and 47.2274 north (lon/lat).
* Example for the two easting/northing columns "8.8249;47.2274".
+
* Note that WKT uses lon,lat notation.
 +
* This option supports Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and even GeometryCollection and ARCs!
 +
* [[WKT]] ("Well Known Text") is defined by the Open Geospatial Consortium (OGC) and described in their "Simple Feature Access Specification" (also ISO SQL/MM). See e.g. [http://en.wikipedia.org/wiki/Well-known_text].
 +
 
 +
Option "Point(X/Y)":
 +
* Geometry Point type as two columns, meaning an easting/northing coordinate pair (longitude/latitude, lon/lat, long/lat, sometimes also called x/y).
 +
* Example for two lon and lat (Point) coordinate columns is "8.8249;47.2274".
 
* This option supports only Points.
 
* This option supports only Points.
 
+
* See "CSVT file format specification" below.
Option WKT:
 
* It's one single column of type String containing a constructor, like for example: "POINT (8.8249 47.2274)".
 
* This option supports Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and even GeometryCollection and ARCs!
 
* [[WKT]] ("Well Known Text") is originally defined by the Open Geospatial Consortium (OGC) and described in their Simple Feature Access specification (also ISO SQL/MM). See e.g. http://en.wikipedia.org/wiki/Well-known_text
 
  
 
Common restrictions:
 
Common restrictions:
* There are more than one geometry columns allowed per sheet but only one column(-pair) can have either type easting/northing or WKT.
+
* There are more than one geometry columns allowed per sheet but only one column(-pair) can have either type Point(X/Y) or WKT.
 
* Coordinate system is WGS84 (EPSG:4326) by default. See section PRJ.
 
* Coordinate system is WGS84 (EPSG:4326) by default. See section PRJ.
 
* All geometry values within one table are in the same coordinate reference system ([[CRS]]).
 
* All geometry values within one table are in the same coordinate reference system ([[CRS]]).
Zeile 44: Zeile 48:
 
** Contains Coordinate Reference System ([[CRS]]) information.  
 
** Contains Coordinate Reference System ([[CRS]]) information.  
 
** File extension is '''.PRJ (or .prj)'''.  
 
** File extension is '''.PRJ (or .prj)'''.  
** Default (and strongly recommended) is EPSG:4326 (WGS84, lon/lat).
+
** Default is EPSG:4326 (WGS84, lon/lat).
 
* CSVZ:
 
* CSVZ:
 
** File extension is '''.CSVZ (or .csvz)'''
 
** File extension is '''.CSVZ (or .csvz)'''
Zeile 56: Zeile 60:
 
* File extension is '''.CSV (or .csv)'''.
 
* File extension is '''.CSV (or .csv)'''.
 
* Character Encoding and character set is UTF-8 (default) ''or ANSI/Windows-1252(?)''.
 
* Character Encoding and character set is UTF-8 (default) ''or ANSI/Windows-1252(?)''.
* End-of-lines are: CR, LF or CR/LF.
+
* End-of-lines are: CR, LF or CR/LF (unless embedded in parantheses).
* Line End-of-lines (in String) fields) are disallowed (use e.g. HTML is needed).
+
* At the end of the file there may be an empty line.
  
 
Rows:
 
Rows:
Zeile 65: Zeile 69:
  
 
Fields/columns:
 
Fields/columns:
* Field delimiter is semicolon (;) by default.  
+
* Field delimiter default (and preferred) is semicolon (;) unless defined otherwise (Note: CSVT delimiter uses comma).  
 
* Strings are enclosed by parantheses, to allow delimiters inside (e.g. "string").
 
* Strings are enclosed by parantheses, to allow delimiters inside (e.g. "string").
 
* Data types (if supported from source or target system): See CSVT file format specification.
 
* Data types (if supported from source or target system): See CSVT file format specification.
* Calculations are possible in fields of type String (like "=A1+B1").
+
* Line End-of-lines (in String) fields are not recommended use e.g. HTML is needed); they are only allowed in strings within parantheses (see rfc4180).
 +
* Calculations are not part of this spec.
  
 
See also [[CSV]].
 
See also [[CSV]].
Zeile 74: Zeile 79:
 
=== CSVT file format specification ===
 
=== CSVT file format specification ===
  
CSVT means "CSV Types" and it describes the field types and eventually their properties.
+
CSVT means "CSV Types" and it describes the field types and eventually subtypes or properties separated by comma.
  
Field/column types, case insensitive, eventually in quotes ('"') - if supported from source or target system:  
+
Field/column types, case insensitive, eventually in quotes ('"'):  
 
* Integer or "Integer".
 
* Integer or "Integer".
 
* Real or "Real".
 
* Real or "Real".
 
* String or "String".
 
* String or "String".
* Date ("YYYY-MM-DD"), Time ("HH:MM:SS+nn") and DateTime ("YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone.
+
* Date (format "YYYY-MM-DD"), Time (format "HH:MM:SS+nn") and DateTime (format "YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone.
* Easting and Northing - as two separate colums.
+
* "WKT" (preferred over Point(X/Y)). All WKT geometry types are allowed: Point, LineString, Polygon, Multipoint, MultiLinestring, MultiPolygon, GeometryCollection, Arcs, ... (see OGC WKT).
* WKT or subtypes WKT(Point), WKT(LineString), WKT(Polygon).
+
* "CoordX","CoordY" (preferred) or "Point(X)","Point(Y)". Two separate colums in either order and not necessary neighboring of type Integer or Float: one containing the easting coordinate, and another containing northing coordinate separated by a comma.
* If WKT name is indicated without subtypes, all WKT geometry types are allowed: Multipoint, FeatureCollection, Arcs, ... (see OGC WKT).
+
* All values of that WKT column MAY contain the same geometry (sub)type.
* All values of that WKT column are expected to contain the same geometry (sub)type.
 
  
 
Notes:
 
Notes:
 +
* Note that CSVT fields are separated by commas while GeoCSV fields are separated by semicolons.
 
* Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
 
* Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
 
* Types can have precision in parantheses, e.g. ('Real(20.2)')).  
 
* Types can have precision in parantheses, e.g. ('Real(20.2)')).  
* There's only one geometry column per .csvt, namely either "Easting","Northing" or "WKT" (but not both).  
+
* There's only one geometry column per .csvt, namely either "Point(X)","Point(Y)" (or "CoordX","CoordY") or "WKT".  
* Geometry types are a kind of subtype: Easting and Northing values are stored as integer or float, option WKT is stored in one column of type String.
+
* Geometry types are a kind of subtypes: CoordX,CoordY values are stored as integer or float, a WKT field is stored in one single String column.
 
* See also http://www.gdal.org/drv_csv.html section with .csvt extension.
 
* See also http://www.gdal.org/drv_csv.html section with .csvt extension.
* ''(Enhancement issue: There could be more properties like "mantatory/optional" or, for strings field length, and for numbers precision etc.)''
+
* ''(Enhancement issue: There could be more properties like "mandatory/optional" or, for strings field length, and for numbers precision etc.)''
  
 
=== PRJ file format specification ===
 
=== PRJ file format specification ===
  
* Default is EPSG:4326 (WGS84, lon/lat).
+
* Default is EPSG:4326 (WGS84, geographic, geo-centered longitude/latitude).
* Contains a named CRS, i.e. the EPSG number "EPSG:nnnn" in OGR WKT (as needed e.g. for [[OGR]], see [http://epsg.io EPGS.io]).
+
* Contains a named CRS, i.e. the EPSG number "EPSG:nnnn" in OGR WKT format (the one natively spoken by [[OGR]]/GDAL, based on [http://www.opengeospatial.org/standards/ct OGC 01-009]).
* ''(Same spec. like http://tools.ietf.org/html/draft-butler-geojson-04 )''
 
  
 
=== Software ===  
 
=== Software ===  
* Desktop
+
* Desktop GIS:
 +
** [[QGIS]] with [[Editable GeoCSV QGIS Plugin]] '''*** featured ***'''
 +
** [[QGIS]] through usual "Add Vector Layer..." dialog
 +
** [[OGR]] Version 2.x
 +
* Desktop generic:
 
** LibreOffice / OpenOffice  
 
** LibreOffice / OpenOffice  
 
** Excel
 
** Excel
** [[OGR]]
+
** [[Kettle]]
** [[QGIS]]
+
** [http://csvkit.readthedocs.org/csvkit (Python)]
 
* Online:
 
* Online:
 
** [[GeoConverter]]
 
** [[GeoConverter]]
 +
** [http://geojson.io GeoJSON.io]
 
** CSV-to-[[GeoJSON]]: [http://www.convertcsv.com/csv-to-geojson.htm convertcsv.com], [http://mapbox.github.io/csv2geojson/ csv2geojson]
 
** CSV-to-[[GeoJSON]]: [http://www.convertcsv.com/csv-to-geojson.htm convertcsv.com], [http://mapbox.github.io/csv2geojson/ csv2geojson]
 
** GeoJSON-to-CSV: [http://www.convertcsv.com/geojson-to-csv.htm convertcsv.com]
 
** GeoJSON-to-CSV: [http://www.convertcsv.com/geojson-to-csv.htm convertcsv.com]
Zeile 113: Zeile 122:
 
=== Examples ===
 
=== Examples ===
  
CSV type file 'example1.csvt':
+
CSV type file 'example1.csvt' - Option Point(X/Y):
<pre>
+
<pre>Integer,String,Real,String,CoordX,CoordY</pre>
Integer,String,Real,String,Easting,Northing
 
</pre>
 
  
CSV file 'example1.csv - Option easting/northing:
+
CSV file 'example1.csv - Option Point(X/Y):
 
<pre>
 
<pre>
 
id;name;amount;city;lon;lat
 
id;name;amount;city;lon;lat
Zeile 126: Zeile 133:
 
</pre>
 
</pre>
  
CSV file 'example1.csv - Option WKT:
+
CSV type file 'example2.csvt' - Option WKT:
 +
<pre>Integer,String,Real,String,WKT</pre>
 +
 
 +
CSV file 'example2.csv - Option WKT:
 
<pre>
 
<pre>
id;name;amount;city;WKT
+
id;name;amount;city;geom
 
1;Kevin;2.1;Rapperswil;POINT(8.8249 47.2274)
 
1;Kevin;2.1;Rapperswil;POINT(8.8249 47.2274)
 
2;Eva;2.2;Zürich;POINT(8.5435 47.3768)
 
2;Eva;2.2;Zürich;POINT(8.5435 47.3768)
 
3;"Jimmy;Muff";2.3;;POINT(7.4397 46.9487)
 
3;"Jimmy;Muff";2.3;;POINT(7.4397 46.9487)
 
</pre>
 
</pre>
 
  
 
...can be shown as following table:
 
...can be shown as following table:
Zeile 165: Zeile 174:
  
 
Note the remarks string in row 2 and the empty string in row 3.
 
Note the remarks string in row 2 and the empty string in row 3.
 +
 +
== Resources ==
 +
 +
Media:
 +
* '''[http://geometalab.tumblr.com/post/119849935292/geocsv-leveraging-a-common-it-data-exchange Post about GeoCSV at Geometa Lab HSR Blog]'''
 +
 +
CSV/TSV-Specs:
 +
* GeoJSON Spec. at IETF: [http://tools.ietf.org/html/draft-butler-geojson-04
 +
* rfc4180: http://tools.ietf.org/html/rfc4180
 +
* TSV: http://www.cs.tut.fi/~jkorpela/TSV.html
 +
* Super-CSV: http://super-csv.github.io/super-csv/csv_specification.html
 +
* Tabular Data Package: http://dataprotocols.org/tabular-data-package/
 +
* OGR CSV driver: http://www.gdal.org/drv_csv.html
 +
* Google Earth: https://support.google.com/earth/answer/148104?hl=en

Aktuelle Version vom 22. Februar 2016, 13:31 Uhr

Specification of the tabular file format CSV (Comma Separated Values) with an optional geometry extension!

 >> DRAFT Version 0.3 - Date of last modification: see bottom <<

Author: Stefan and contributors. (For notes and discussion see Diskussion:GeoCSV).

Introduction

GeoCSV is an extension of the "human readable", tabular file format Comma-Separated Values (CSV) and/or Tab-Separated Values (TSV). CSV/TSV are well-known but spartanic format with possible information loss.

For exchanging geospatial data think about using more capable and elegant formats file exchange like e.g. GeoPackage. One the other hand it has some potential since it's quite more capable as e.g. a Shapefile. See also TheShapefileChallenge.

This format has following drawbacks:

  • not suited for massive datasets (except when compressing/zipping)
  • only one layer per file
  • no layer name - except for the file name (which can be changed easily by others...).
  • many other drawbacks like auxiliary cluttered accompanying files, like .csvt and .prj which it shares with Shapefiles.

GeoCSV file format specification

GeoCSV is based on the CSV specification (see following section) and comes with two variants: Option Point(X/Y) and Option WKT.

Option "WKT" (preferred):

  • It's one single column of type String containing a constructor, like for example: "POINT (8.8249 47.2274)", meaning 8.8249 east and 47.2274 north (lon/lat).
  • Note that WKT uses lon,lat notation.
  • This option supports Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and even GeometryCollection and ARCs!
  • WKT ("Well Known Text") is defined by the Open Geospatial Consortium (OGC) and described in their "Simple Feature Access Specification" (also ISO SQL/MM). See e.g. [1].

Option "Point(X/Y)":

  • Geometry Point type as two columns, meaning an easting/northing coordinate pair (longitude/latitude, lon/lat, long/lat, sometimes also called x/y).
  • Example for two lon and lat (Point) coordinate columns is "8.8249;47.2274".
  • This option supports only Points.
  • See "CSVT file format specification" below.

Common restrictions:

  • There are more than one geometry columns allowed per sheet but only one column(-pair) can have either type Point(X/Y) or WKT.
  • Coordinate system is WGS84 (EPSG:4326) by default. See section PRJ.
  • All geometry values within one table are in the same coordinate reference system (CRS).

Optional auxiliary files (with same base filename but different file extensions) are:

  • CSVT:
    • Contains field type information (schema).
    • File extension is .CSVT (or .csvt).
    • See section below.
  • PRJ (to be clarified!):
    • Contains Coordinate Reference System (CRS) information.
    • File extension is .PRJ (or .prj).
    • Default is EPSG:4326 (WGS84, lon/lat).
  • CSVZ:
    • File extension is .CSVZ (or .csvz)
    • The CSV file can be accompanied with following files, having the same file base name: .csvt and .prj.
    • Archiving and compressing in format .ZIP (or .zip) is also possible and encouraged.

CSV file format specification

File:

  • Contains the actual (geo-)data.
  • File extension is .CSV (or .csv).
  • Character Encoding and character set is UTF-8 (default) or ANSI/Windows-1252(?).
  • End-of-lines are: CR, LF or CR/LF (unless embedded in parantheses).
  • At the end of the file there may be an empty line.

Rows:

  • First row contains attribute names separated by a => delimiter.
  • Following rows are contains values separated by a => delimiter.
  • All rows have same number of attributes.

Fields/columns:

  • Field delimiter default (and preferred) is semicolon (;) unless defined otherwise (Note: CSVT delimiter uses comma).
  • Strings are enclosed by parantheses, to allow delimiters inside (e.g. "string").
  • Data types (if supported from source or target system): See CSVT file format specification.
  • Line End-of-lines (in String) fields are not recommended use e.g. HTML is needed); they are only allowed in strings within parantheses (see rfc4180).
  • Calculations are not part of this spec.

See also CSV.

CSVT file format specification

CSVT means "CSV Types" and it describes the field types and eventually subtypes or properties separated by comma.

Field/column types, case insensitive, eventually in quotes ('"'):

  • Integer or "Integer".
  • Real or "Real".
  • String or "String".
  • Date (format "YYYY-MM-DD"), Time (format "HH:MM:SS+nn") and DateTime (format "YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone.
  • "WKT" (preferred over Point(X/Y)). All WKT geometry types are allowed: Point, LineString, Polygon, Multipoint, MultiLinestring, MultiPolygon, GeometryCollection, Arcs, ... (see OGC WKT).
  • "CoordX","CoordY" (preferred) or "Point(X)","Point(Y)". Two separate colums in either order and not necessary neighboring of type Integer or Float: one containing the easting coordinate, and another containing northing coordinate separated by a comma.
  • All values of that WKT column MAY contain the same geometry (sub)type.

Notes:

  • Note that CSVT fields are separated by commas while GeoCSV fields are separated by semicolons.
  • Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
  • Types can have precision in parantheses, e.g. ('Real(20.2)')).
  • There's only one geometry column per .csvt, namely either "Point(X)","Point(Y)" (or "CoordX","CoordY") or "WKT".
  • Geometry types are a kind of subtypes: CoordX,CoordY values are stored as integer or float, a WKT field is stored in one single String column.
  • See also http://www.gdal.org/drv_csv.html section with .csvt extension.
  • (Enhancement issue: There could be more properties like "mandatory/optional" or, for strings field length, and for numbers precision etc.)

PRJ file format specification

  • Default is EPSG:4326 (WGS84, geographic, geo-centered longitude/latitude).
  • Contains a named CRS, i.e. the EPSG number "EPSG:nnnn" in OGR WKT format (the one natively spoken by OGR/GDAL, based on OGC 01-009).

Software

Examples

CSV type file 'example1.csvt' - Option Point(X/Y):

Integer,String,Real,String,CoordX,CoordY

CSV file 'example1.csv - Option Point(X/Y):

id;name;amount;city;lon;lat
1;Kevin;2.1;Rapperswil;8.8249;47.2274
2;Eva;2.2;Zürich;8.5435;47.3768
3;"Jimmy;Muff";2.3;;7.4397;46.9487

CSV type file 'example2.csvt' - Option WKT:

Integer,String,Real,String,WKT

CSV file 'example2.csv - Option WKT:

id;name;amount;city;geom
1;Kevin;2.1;Rapperswil;POINT(8.8249 47.2274)
2;Eva;2.2;Zürich;POINT(8.5435 47.3768)
3;"Jimmy;Muff";2.3;;POINT(7.4397 46.9487)

...can be shown as following table:

id name amount remarks geom
1 Kevin 2.1 Rapperswil POINT(8.8249 47.2274)
2 Eva 2.2 Zürich POINT(8.5435 47.3768)
2 Jimmy;Muff 2.3 POINT(7.4397 46.9487)

Note the remarks string in row 2 and the empty string in row 3.

Resources

Media:

CSV/TSV-Specs: