GeoCSV
Specification of the tabular file format CSV (Comma Separated Values) with an optional geometry extension!
>> DRAFT Version 0.1 <<
Date of last modification: see bottom, Author: Stefan. (For notes and discussion see Diskussion:GeoCSV).
Inhaltsverzeichnis
Introduction
GeoCSV is an extension of the "human readable", tabular file format Comma-Separated Values (CSV) and/or Tab-Separated Values (TSV). CSV/TSV are well-known but spartanic format with possible information loss.
For exchanging geospatial data think about using more capable and elegant formats file exchange like e.g. GeoPackage. One the other hand it has some potential since it's quite more capable as e.g. a Shapefile. See also TheShapefileChallenge.
This format has following drawbacks:
- no layer name - except for the file name (which can be changed easily by others...).
- auxiliary cluttered accompanying files, like .csvt and .prj
GeoCSV file format specification
GeoCSV is based on the CSV specification (see following section) and comes with two variants: Options easting/northing and Options WKT.
Option "easting/northing" (longitude/latitude, lon/lat, long/lat, x/y like in mathematics):
- Geometry Point type as two neighboring columns of type Float: one containing the easting coordinate, and one containing northing coordinate separated by the common delimiter.
- Example for the two easting/northing columnts "8.8249;47.2274".
- This option supports only Points.
Option WKT:
- It' one single column of type String containing a constructor, like for example: "POINT (8.8249 47.2274)".
- This option supports Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and even GeometryCollection and ARCs!
- WKT ("Well Known Text") is originally defined by the Open Geospatial Consortium (OGC) and described in their Simple Feature Access specification (also ISO SQL/MM). See e.g. http://en.wikipedia.org/wiki/Well-known_text
Common restrictions:
- Coordinate system is WGS84 (EPSG:4326) by default. See section PRJ.
- There are more than one geometry columns allowed per sheet but only one column(-pair) can have either type easting/northing or WKT.
- All geometry values within one table are in the same coordinate reference system (CRS).
Optional auxiliary files (with same base filename but different file extensions) are:
- CSVT:
- Contains field type information (schema).
- File extension is .CSVT (or .csvt).
- See section below.
- PRJ (to be clarified!):
- Contains Coordinate Reference System (CRS) information.
- File extension is .PRJ (or .prj).
- Default (and strongly recommended) is EPSG:4326 (WGS84, lon/lat).
- CSVZ:
- File extension is .CSVZ (or .csvz)
- The CSV file can be accompanied with following files, having the same file base name: .csvt and .prj.
- Archiving and compressing in format .ZIP (or .zip) is also possible and encouraged.
CSV file format specification
File:
- Contains the actual (geo-)data.
- File extension is .CSV (or .csv).
- Character Encoding and character set is UTF-8 (default) or ANSI/Windows-1252(?).
- End-of-lines are: CR, LF or CR/LF.
- Line End-of-lines (in String) fields) are disallowed (use e.g. HTML is needed).
Rows:
- First row contains attribute names separated by a => delimiter.
- Following rows are contains values separated by a => delimiter.
- All rows have same number of attributes.
Fields/columns:
- Field delimiter is semicolon (;) by default.
- Strings are enclosed by parantheses, to allow delimiters inside (e.g. "string").
- Data types (if supported from source or target system): See CSVT file format specification.
- Calculations are possible in fields of type String (like "=A1+B1").
See also CSV.
CSVT file format specification
CSVT means "CSV Types" and it describes the field types and eventually their properties.
Field/column types, case insensitive, eventually in quotes ('"') - if supported from source or target system:
- Integer or "Integer"
- Real or "Real"
- String or "String"
- Date ("YYYY-MM-DD"), Time ("HH:MM:SS+nn") and DateTime ("YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone
- Easting and Northing - as two separate, neighboring colums
- WKT
Notes:
- Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
- Types can have precision in parantheses, e.g. ('Real(20.2)')).
- Geometry types or either Easting and Northing or WKT bit not both. Each can occur several times in a table.
- Geometry types are a kind of subtype: Easting and Norting values are stored as float, option WKT is stored in one column of type String.
- See also http://www.gdal.org/drv_csv.html section with .csvt extension.
- (There could be more properties like "mantatory/optional" or, for strings field length, and for numbers precision etc.)
PRJ file format specification
- Default is EPSG:4326 (WGS84, lon/lat).
- Contains a named CRS, i.e. the EPSG number "EPSG:nnnn" in OGR WKT (as needed e.g. for OGR, see EPGS.io).
- (Same spec. like http://tools.ietf.org/html/draft-butler-geojson-04 )
Software
- Desktop
- Online:
- GeoConverter
- CSV-to-GeoJSON: convertcsv.com, csv2geojson
- GeoJSON-to-CSV: convertcsv.com
Examples
CSV type file 'example1.csvt':
Integer;String;Real;String;Easting;Northing
CSV file 'example1.csv - Option easting/northing :
id;name;amount;city;lon;lat 1;Kevin;2.1;Rapperswil;8.8249;47.2274 2;Eva;2.2;Zürich;8.5435;47.3768 3;"Jimmy;Muff";2.3;;7.4397;46.9487
CSV file 'example1.csv - Option WKT:
id;name;amount;city;WKT 1;Kevin;2.1;Rapperswil;POINT(8.8249 47.2274) 2;Eva;2.2;Zürich;POINT(8.5435 47.3768) 3;"Jimmy;Muff";2.3;;POINT(7.4397 46.9487)
...can be shown as following table:
id | name | amount | remarks | geom |
---|---|---|---|---|
1 | Kevin | 2.1 | Rapperswil | POINT(8.8249 47.2274) |
2 | Eva | 2.2 | Zürich | POINT(8.5435 47.3768) |
2 | Jimmy;Muff | 2.3 | POINT(7.4397 46.9487) |
Note the remarks string in row 2 and the empty string in row 3.