Learning database tools means working with data, so we need to cover that aspect first. There are many ways of importing data to PgSQL/PostGIS; some are more database-specific, some are PostGIS-specific, and some use external tools. To complicate things a bit more, quite often real-world data import processes are wrapped into programs that perform different tasks and ops in order to maintain the data quality and integrity when importing it. The key though is that even very complex import tools usually use simpler procedures or commands in order to achieve their goals.
Such simple import procedures are described in this chapter. We specifically focus on:
- Importing flat data through both psql and pgAdmin and extracting spatial information from flat data
- Importing shape files using
- Importing vector data using
- Importing vector data using GIS clients
- Importing OpenStreetMap data
- Connecting to external data sources with data wrappers
- Loading rasters using
- Importing data with pgrestore
Before we start importing, let's get some data examples in different formats, specifically these:
- AddressBase in CSV and GML
- Code-Point Polygons in SHP, TAB and MIF
- Points of Interest in TXT format
- 110M coastlines
- 110M land
- 50M Gray Earth
- Earthquake data in CSV and KML format (https://earthquake.usgs.gov/earthquakes/map/)
- UK Ordnance Survey sample data (https://www.ordnancesurvey.co.uk/business-and-government/licensing/sample-data/discover-data.html)
- NaturalEarth (http://www.naturalearthdata.com/downloads/110m-physical-vectors/)
You may either download the data using the links provided or find it in this chapter's resources. The location you extract the data to is not important really, as you can later address it using either relative or absolute file paths.
All the examples in this chapter use a database named
mastering_postgis. This database has been created off the
The PgSQL on my dev machine listens on port
5434, which is not the default port for the Postgres database (default is
5432); so when using a default DB setup, you may have to adjust some of the examples a bit.
If you need to change the port your db listens on, you should locate the db data directory, where you will find a
postgresql.conf file. This is a text file, so you can edit it with an ordinary text editor. In order to adjust the port, find a
port configuration in the Connections and Authentication section.
Schemas are a great way of managing the data and splitting it into meaningful collections. In most scenarios, one will have some production data, archive data, incoming data, and so on sensibly kept in separate schemas. Using additional schemas will depend on your requirements, but we do encourage you to introduce using schemas into your daily practice if you do not yet do so. The following examples import the data into tables defined in the
Loading flat data may seem to be a bit dull initially but it is important to understand that many popular and interesting datasets often contain the spatial information in very different formats, such as:
- Coordinates expressed in Lon/Lat or projected coordinates
- Encoded geometry, for example WKT, TopoJSON, GeoJSON
- Location in the form of an address
- Location in non-cartesian coordinates, for example start point, angle and direction
- While the earlier examples indicate the data would require further processing in order to extract the spatial content into a usable form, clearly ability to import flat datasets should not be underestimated
Flat data in our scenario is data with no explicitly expressed geometry - non-spatial format, text-based files
Psql is the pgsql's command-line tool. While one can achieve quite a lot with GUI based database management utilities, psql is very useful when one needs to handle database backups, management and alike via scripting. When there is no GUI installed on the server, psql becomes pretty much the only option so it is worth being familiar with it even if you're not a fan.
In order to import the data in psql we will use a
\COPY command. This requires us to define the data model for the incoming data first.
Defining the table data model from a text file may be prone to errors that will prevent data from being imported. If for of some reason you are not sure what data types are stored in the particular columns of your source file you can import all the data as text and then re-cast it as required at a later time.
In this example we will import the earthquakes data from USGS. So let's fire up psql and connect to the database server:
F:\mastering_postgis\chapter02>psql -h localhost -p 5434 -U postgres
You should see a similar output:
psql (9.5.0) Type "help" for help. postgres=#
Then we need to connect to the
postgres=# \c mastering_postgis
The following output should be displayed:
You are now connected to database "mastering_postgis" as user "postgres". mastering_postgis=#
In the psql examples I am using
postgres user. As you may guess, it is a superuser account. This is not the thing you would normally do, but it will keep the examples simple. In a production environment, using a db user with credentials allowing access to specific resources is a sensible approach.
If you have not had a chance to create our
data_import schema, let's take care of it now by typing the following command:
mastering_postgis=# create schema if not exists data_import;
You should see a similar output:
NOTICE: schema "data_import" already exists, skipping CREATE SCHEMA
Once the schema is there, we create the table that will store the data. In order to do so just type or paste the following into psql:
create table data_import.earthquakes_csv ( "time" timestamp with time zone, latitude numeric, longitude numeric, depth numeric, mag numeric, magType varchar, nst numeric, gap numeric, dmin numeric, rms numeric, net varchar, id varchar, updated timestamp with time zone, place varchar, type varchar, horizontalError numeric, depthError numeric, magError numeric, magNst numeric, status varchar, locationSource varchar, magSource varchar );
You should see the following output:
mastering_postgis=# create table data_import.earthquakes_csv ( mastering_postgis(# "time" timestamp with time zone, mastering_postgis(# latitude numeric, mastering_postgis(# longitude numeric, mastering_postgis(# depth numeric, mastering_postgis(# mag numeric, mastering_postgis(# magType varchar, mastering_postgis(# nst numeric, mastering_postgis(# gap numeric, mastering_postgis(# dmin numeric, mastering_postgis(# rms numeric, mastering_postgis(# net varchar, mastering_postgis(# id varchar, mastering_postgis(# updated timestamp with time zone, mastering_postgis(# place varchar, mastering_postgis(# type varchar, mastering_postgis(# horizontalError numeric, mastering_postgis(# depthError numeric, mastering_postgis(# magError numeric, mastering_postgis(# magNst numeric, mastering_postgis(# status varchar, mastering_postgis(# locationSource varchar, mastering_postgis(# magSource varchar mastering_postgis(# ); CREATE TABLE
Now, as we have our data table ready, we can finally get to the import part. The following command should handle importing the data into our newly created table:
\copy data_import.earthquakes_csv from data\2.5_day.csv with DELIMITER ',' CSV HEADER
You should see a similar output:
mastering_postgis=# \copy data_import.earthquakes_csv from data\2.5_day.csv with DELIMITER ',' CSV HEADER COPY 25
While you can customize your data after importing, you may wish to only import a subset of columns in the first place. Unfortunately
\COPY command imports all the columns (although you may specify where to put them) so the solution to this would be:
- Create a table that models the input CSV
- Import all the data
- Create a table with a subset of columns
- Copy data over
- Delete the input table
Even though everything said earlier is possible in psql, it requires quite a lot of typing. Because of that we will take care of this scenario in the next stage. Demonstrating the non-interactive psql mode.
For the non-interactive psql data import example we'll do a bit more than in the interactive mode. We'll:
- Import the full earthquakes dataset
- Select a subset of earthquakes data mentioned in the previous example and insert it into its own table
- Import another dataset - in this case the Ordnance Survey's POIs
Basically the non-interactive usage of psql means we simply provide it with an SQL to execute. This way we can put together many statements without having to execute them one by one.
Once again we will need the data model prior to loading the data, and then a
\COPY command will be used.
If you're still in psql, you can execute a script by simply typing:
You should see a similar output:
mastering_postgis-# \i F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql CREATE SCHEMA psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:5: NOTICE: table "earthquakes_csv" does not exist, skipping DROP TABLE CREATE TABLE COPY 25 psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:58: NOTICE: table "earthquakes_csv_subset" does not exist, skipping DROP TABLE SELECT 25 mastering_postgis-#
If you quit psql already, type the following command into cmd:
psql -h host -p port -U user -d database -f path\to\the\script.sql
psql -h localhost -p 5434 -U postgres -d mastering_postgis -f F:\mastering_postgis\chapter02\code\data_import_earthquakes.sql
You should see a similar output:
F:\mastering_postgis\chapter02>psql -h localhost -p 5434 -U postgres -d mastering_postgis -f F:\mastering_postgis\chapter02\code\data_import_earthquakes.sql psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:2: NOTICE: schema "data_import" already exists, skipping CREATE SCHEMA DROP TABLE CREATE TABLE COPY 25 DROP TABLE SELECT 25
The script executed earlier is in the book's code repository under
Loading OS POI data is now a piece of cake. This dataset is in a bit of a different format though, so it requires slight adjustments. You can review the code in
In this section we'll import some new data we have not interacted with before - this time we'll have a look at the Ordnance Survey's address data we obtained in the CSV format.
Depending on the pgAdmin version, the UI may differ a bit. The described functionality should always be present though. For the examples involving pgAdmin, screenshots were taken using pgAdmin III (1.22.2).
PgAdmin's import functionality is basically a wrapper around the
\COPY so it does require a data model in order to work. Because of that, let's quickly create a table that will be populated with the imported data. You can do it with the GUI by simply right-clicking a schema node you want to create the table in and choosing
New Object |
New Table and then providing all the necessary model definitions in the displayed window:
You can also type some SQL which in many cases is a bit quicker:
drop table if exists data_import.osgb_addresses; create table data_import.osgb_addresses( uprn bigint, os_address_toid varchar, udprn integer, organisation_name varchar, department_name varchar, po_box varchar, sub_building_name varchar, building_name varchar, building_number varchar, dependent_thoroughfare varchar, thoroughfare varchar, post_town varchar, dbl_dependent_locality varchar, dependent_locality varchar, postcode varchar, postcode_type varchar, x numeric, y numeric, lat numeric, lon numeric, rpc numeric, country varchar, change_type varchar, la_start_date date, rm_start_date date, last_update_date date, class varchar );
Once our table is ready, importing data is just a matter of right clicking the table node in
PgAdmin and choosing
Import. An import wizard that assists with the import process will be displayed:
As we have some flat data already in our database, it's time to extract the spatial information. So far all the datasets, used Cartesian coordinate systems so our job is really straightforward:
drop table if exists data_import.earthquakes_subset_with_geom; select id, "time", depth, mag, magtype, place,Points of Interest in TXT format ST_SetSRID(ST_Point(longitude, latitude), 4326) as geom into data_import.earthquakes_subset_with_geom from data_import.earthquakes_csv;
This example extracts a subset of data and puts data into a new table with coordinates being expressed as a geometry type, rather than two columns with numeric data appropriate for Lon and Lat.
In order to quickly preview the data, we dump the table's content to KML using
ogr2ogr (this is a little spoiler on the next chapter on exporting the data from PostGIS indeed):
ogr2ogr -f "KML" earthquakes_from_postgis.kml PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" data_import.earthquakes_subset_with_geom -t_srs EPSG:4326
Such KML can be viewed for example in Google Earth (you can use the original KML downloaded from USGS just as a cross check for the output data):
ESRI shapefile (SHP) is still the most common exchange format for sharing GIS data. The format itself is made of a few files such as SHP, SHX, DBF, andPRJ, where the first three are the required files and the file with projection information is not obligatory.
The standard PostGIS tool for loading shapefiles is
shp2pgsql - you will find it in the bin folder of your postgres installation.
shp2pgsql is a command-line utility that can either extract the shapefile data into SQL or pipe the output directly into psql (we'll see both approaches).
shp2pgsql also has a GUI version that can be accessed directly in PgAdmin.
In this example, we'll use some
NaturalEarth shapefiles we downloaded earlier. We will import the coastlines shapefile using the CMD version of
shp2pgsql and then we'll add land masses using the GUI version.
shp2pgsql has an extensive list of parameters that can be accessed by simply typing
shp2pgsql in the cmd. We will not use all the options but rather explain the most common ones.
The basic usage of the utility is as follows:
shp2pgsql [<options>] <shapefile> [[<schema>.]<table>]
shp2pgsql -s 4326 ne_110m_coastline data_import.ne_coastline
Basically you specify what shapefile you want to import and where. If you executed the earlier command, you would just see the
shp2pgsql plain SQL output logged to the console, similar to this:
... INSERT INTO "data_import"."ne_coastline" ("scalerank","featurecla",geom) VALUES ('3','Country','0105000020E6100000010000000102000000060000006666666666A65AC06766666666665240713D0AD7A3505AC0295C8FC2F56852400000000000205AC07B14AE47E15A5240B91E85EB51585AC0713D0AD7A33052405C8FC2F528BC5AC03E0AD7A3705D52406666666666A65AC06766666666665240'); COMMIT; ANALYZE "data_import"."ne_coastline";
So basically, we need to do something with the utility output in order to make use of it. Let's save the output to an SQL file and let psql read it first:
shp2pgsql -s 4326 ne_110m_coastline data_import.ne_coastline > ne_110m_coastline.sql psql -h localhost -p 5434 -U postgres -d mastering_postgis -f ne_110m_coastline.sql
You should see a similar output:
SET SET BEGIN CREATE TABLE ALTER TABLE addgeometrycolumn --------------------------------------------------------------------- data_import.ne_coastline.geom SRID:4326 TYPE:MULTILINESTRING DIMS:2 (1 row) INSERT 0 1 ... INSERT 0 1 COMMIT ANALYZE
I suggest you have a look at the generated SQL so you get an idea of what is actually happening behind the scenes.
Now let's pipe the
shp2pgsql output directly to psql:
shp2pgsql -s 4326 ne_110m_coastline data_import.ne_coastline | psql -h localhost -p 5434 -U postgres -d mastering_postgis
The cmd output will be exactly the same as the one we have already seen when reading data from the SQL file.
You will have to drop the
data_import.ne_coastline table before importing the data again; otherwise the command in its current shape will fail.
There are some
shp2pgsql options that are worth remembering:
-s SRID: Specifies the shp data projection identifier. When used in the following form:
-s SRID1:SRID2makes the
shp2pgsqlapply a coordinate system transformation, so projection of the data changes is required.
-p: Turns on the 'prepare' mode - only a table definition is output.
-d: Drops and recreates a table.
-a: Appends data to the existing table, provided its schema is exactly the same as the schema of the incoming data.
-g: Allows specifying of the geometry column name; the default is geom (or geog if you decide to use geography with the -G param).
-m <filename>: Specifies a file name that contains column mapping for the DBF file. This way, you can remap dbf column names to your preference.
-n: Only imports DBF and no spatial data.
Importing data with SRID transformation:
shp2pgsql also has a GUI version. In order to use it, when in PgAdmin, simply choose
PostGIS Shapefile and DBF loader 2.2 (the version may vary); the following import wizard will be displayed:
In pgAdmin 4, accessing the shapefile loader GUI may not be so obvious. To trigger the tool, try typing
shp2pgsql-gui in the shell/command line.
Similar to the cmd version of the utility, you can specify the schema, and you should specify the SRID.
The nice thing about the GUI version of
shp2pgsql is that it lets one import multiple files at once.
In the options dialogue, you can specify data encoding, decide whether or not you would like to create a spatial index after importing, choose geography over geometry, and so on:
ogr2ogr is the GDAL's vector transform utility. It is - not without reason - considered a Swiss Army knife for vector transformations. Despite its size,
ogr2ogr can handle a wide range of formats and this makes it a really worthy tool.
ogr2ogr to import a few data formats other than SHP, although
ogr2ogr can obviously import SHP too. For this scenario, we'll use some data downloaded earlier:
- OS GB address base in GML format
- OS GB code point polygons in MapInof MIF & TAB formats
- USGS earthquakes in KML format
Some of the most common ogr2ogr params are:
-f: The format of the output (when importing to PostGIS it will be PostgreSQL).
-nln: Assigns a name to the layer. In the case of importing the data to PostGIS this will be the table name.
-select: Lets you specify a comma separated list of columns to pick.
-where: Lets you specify a sql like query to filter out the data.
append: Appends data to the output dataset.
overwrite: Overwrites the output datasource - in case of PostgreSQL it will drop and re-create a table.
s_srs: Specifies the input SRID.
t_srs: Transforms coordinates to the specified SRID.
a_srs: Specifies the output SRID.
lco NAME=VALUE: Layer creation options - these are driver specific; for pgsql options, see http://www.gdal.org/drv_pg.html. The most commonly used layer creation options are:
LAUNDER: This defaults to
YES. It is responsible for converting column names into pgsql compatible ones (lower case, underscores).
PRECISION: This defaults to
YES. It is responsible for using numeric and char types over float and varchar.
GEOMETRY_NAME: Defaults to
For a full list of
ogr2ogr params, just type
Ogr2ogr has an accompanying utility called ogrinfo. This tool lets one inspect the metadata of a dataset. Verifying the metadata of any dataset prior to working with it is considered good practice and one should get into the habit of always using it before importing or exporting the data.
Let's start with importing the GML of the OS GB address base. First we'll see what data we're dealing with exactly:
The following should be the output:
Had to open data source read-only. INFO: Open of `sx9090.gml' using driver `GML' successful. 1: Address (Point)
We can then review the layer information:
ogrinfo sx9090.gml Address -so
You should see a similar output:
Had to open data source read-only. INFO: Open of `sx9090.gml' using driver `GML' successful. Layer name: Address Geometry: Point Feature Count: 42861 Extent: (-3.560100, 50.699470) - (-3.488340, 50.744770) Layer SRS WKT: GEOGCS["ETRS89", DATUM["European_Terrestrial_Reference_System_1989", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6258"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4258"]] gml_id: String (0.0) uprn: Real (0.0) osAddressTOID: String (20.0) udprn: Integer (0.0) subBuildingName: String (25.0) buildingName: String (36.0) thoroughfare: String (27.0) postTown: String (6.0) postcode: String (7.0) postcodeType: String (1.0) rpc: Integer (0.0) country: String (1.0) changeType: String (1.0) laStartDate: String (10.0) rmStartDate: String (10.0) lastUpdateDate: String (10.0) class: String (1.0) buildingNumber: Integer (0.0) dependentLocality: String (27.0) organisationName: String (55.0) dependentThoroughfare: String (27.0) poBoxNumber: Integer (0.0) doubleDependentLocality: String (21.0) departmentName: String (37.0)
-so param makes
ogrinfo display the data summary only; otherwise, info on a full dataset would be displayed.
Once we're ready to import the data, let's execute the following command:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" sx9090.gml -nln data_import.osgb_address_base_gml -geomfield geom
At this stage, the address GML should be available in our database.
Both MIF and TAB are MapInfo formats. TAB is the default format that contains formatting, while MIF is the interchange format.
We'll start with reviewing metadata:
ogrinfo EX_sample.mif EX_Sample -so Had to open data source read-only. INFO: Open of `EX_sample.mif' using driver `MapInfo File' successful. Layer name: EX_sample Geometry: Unknown (any) Feature Count: 4142 Extent: (281282.800000, 85614.570000) - (300012.000000, 100272.000000) Layer SRS WKT: PROJCS["unnamed", GEOGCS["unnamed", DATUM["OSGB_1936", SPHEROID["Airy 1930",6377563.396,299.3249646], TOWGS84[375,-111,431,-0,-0,-0,0]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",49], PARAMETER["central_meridian",-2], PARAMETER["scale_factor",0.9996012717], PARAMETER["false_easting",400000], PARAMETER["false_northing",-100000], UNIT["Meter",1]] POSTCODE: String (8.0) UPP: String (20.0) PC_AREA: String (2.0)
Please note that ogrinfo projection metadata for our MIF file does not specify the EPSG code. This is fine, as the projection definition is present. But it will result in ogr2ogr creating a new entry in the
spatial_ref_sys, which is not too good, as we'll end up with the wrong
coordsys identifiers; the coordinate reference id will be the next available. This is because
ogr2ogr expands the coordinate reference into a WKT string and then does a string comparison against the
coordsys identifiers definitions in the
spatial_ref_sys table; minor differences in formatting or precision will result in
ogr2ogr failing to match
coordsys. In such a scenario, a new entry will be created; for example, if you happen to use the EPSG:3857 coordinate system and the system's definition is slightly different and cannot be matched, the assigned SRID will not be 3857, but the next available ID will be chosen. A solution to this is to specify the exact coordinate system;
ogr2ogr should output the data via the
Once ready, we can import the data:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" EX_sample.mif -nln data_import.osgb_code_point_polygons_mif -lco GEOMETRY_NAME=geom -a_srs EPSG:27700
If you followed the very same procedure for TAB file and loaded the data, both datasets are now in their own tables in the
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" EX_sample.tab -nln data_import.osgb_code_point_polygons_tab -lco GEOMETRY_NAME=geom -a_srs EPSG:27700
As usual, we'll start with the dataset's metadata checkup:
The output shows that there is more than one layer:
INFO: Open of `2.5_day_age.kml' using driver `LIBKML' successful. 1: Magnitude 5 2: Magnitude 4 3: Magnitude 3 4: Magnitude 2
Therefore, in order to review metadata for each layer at once, the following command should be used:
ogrinfo 2.5_day_age.kml -al -so
The output of the previous command is rather longish, so we'll truncate it a bit and only show the info for the first layer:
INFO: Open of `2.5_day_age.kml' using driver `LIBKML' successful. Layer name: Magnitude 5 Geometry: Unknown (any) Feature Count: 2 Extent: (-101.000100, -36.056300) - (120.706400, 13.588200) Layer SRS WKT: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9108"]], AUTHORITY["EPSG","4326"]] Name: String (0.0) description: String (0.0) timestamp: DateTime (0.0) begin: DateTime (0.0) end: DateTime (0.0) altitudeMode: String (0.0) tessellate: Integer (0.0) extrude: Integer (0.0) visibility: Integer (0.0) drawOrder: Integer (0.0) icon: String (0.0) snippet: String (0.0)
This time, EPSG information is available, so we do not have to worry;
ogr2ogr will create extra SRID definition in the database.
Once we've confirmed that this is the exact dataset we'd like to import, we can continue with the following command:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" 2.5_day_age.kml -nln data_import.usgs_earthquakes_kml -lco GEOMETRY_NAME=geom -append
Note the append param in the command earlier. This is required because our KML has more than one layer and
ogr2ogr is importing them one by one. Without the append param, only the first layer would be imported and then
ogr2ogr would fail with a similar output:
FAILED: Layer data_import.usgs_earthquakes_kml already exists, and -append not specified.
Consider using -append, or -overwrite.
ERROR 1: Terminating translation prematurely after failed
translation of layer Magnitude 4 (use -skipfailures to skip errors)
The cmd output should be similar to:
WARNING: Layer creation options ignored since an existing layer is being appended to. WARNING: Layer creation options ignored since an existing layer is being appended to. WARNING: Layer creation options ignored since an existing layer is being appended to.
At this stage, the KML dataset should have made it to our PostGIS database.
For those preferring GUI over CMD, there is an alternative to plain old
ogr2gui available from http://www.ogr2gui.ca/.
Simply download the required archive, extract it, and launch the appropriate
.exe. After having played with
ogr2ogr a bit already, the GUI should be rather self-explanatory:
Many GIS software packages can directly connect to databases for reading and writing the data; from our perspective, they are just more database clients. In this section, we'll have a quick look at the very well-known QGIS and the certainly less popular but very powerful Manifold GIS.
Both can export data to databases and read it back. QGIS has a specialized PostGIS export module called SPIT; Manifold's export facility is built in into the GUI and follows the same export routines as other formats handled by the software.
QGIS offers a PostGIS export module called SPIT. You can access it by choosing
Database\Spit\Import Shapefiles to PostGIS:
SPIT's GUI is very clear and easy to understand. You can import many files at once and you can specify the destination schema and table names. If required you can change the default Spit's geometry name (
the_geom) to your liking. SRID also can be changed, but it will be applied to all the imported files. Once you provide all the required information and click on
OK, a progress window is displayed and the data is exported to PostGIS.
Before using DbManager, you should load a shapefile you want to import to a database.
When ready, launch DbManager by going to
Database\DB Manager\DB Manager. When the DbManager UI displays, locate your database, expand its node and select the schema you want to import the data to and then click the
Import button (arrow down). You should be prompted with an import dialog; the
Input dropdown lets you choose the layer to import.
Once you are happy with the import options (you may want to click the
Update options button to populate the dialog), click on
OK. When the import finishes you should see a confirmation dialogbox:
In order to export the data, Manifold needs to import it internally first. This means that we do not export a particular format but rather we simply export spatial data. Once you bring a shapefile into Manifold, MapInfo TAB or MIF, SQL Server spatial table or any other supported vector format, exporting it to PostGIS is exactly the same for all of them.
In order to export a vector component, right-click on its node in the project tree and choose the export option. You will see an export dialog, where you should pick the appropriate export format; in this scenario, you need to choose
Data Source. You will then be presented with a data source setup window, where you can either pick an existing connection or configure a new one:
Once you choose the appropriate connection, you can then set up the actual export parameters:
You can choose which columns you want to export, and the name of the identity, geometry, and version columns.
There is a minor inconvenience with the exporter: it does not allow for adjusting of the destination schema and always exports to the public schema.
Manifold tries to find the PostGIS projection that best matches the Manifold projection. Unfortunately, it is not always possible as Manifold as such does not rely on EPSG coordinates systems definitions, but rather uses its internal mechanisms for handling projections. If Manifold does not match the PostGIS side projection, you can select it manually.
Export dialogue also offers an option to transform coordinates upon export and create indexes and
CREATE/UPDATE triggers when data gets to the database.
In order to enable PostgreSQL connections in Manifold, you may have to copy some PgSQL DLLs over to the Manifold installation directory. The exact information on how to do this can be easily found at georeference.org, the manifold user community forum.
For importing OSM data into PostGIS, we'll use a command line utility called
osm2pgsql. Apparently, making a Linux build of
osm2pgsql is straightforward; getting one that runs on Windows may require some more effort as described here: https://github.com/openstreetmap/osm2pgsql/issues/17, https://github.com/openstreetmap/osm2pgsql/issues/472.
I have used a Cygwin build as mentioned here:
Once we have the
osm2pgsql ready, we'll need some data. For the sake of simplicity, I have downloaded the Greenwich Park area from https://www.openstreetmap.org/export#map=16/51.4766/0.0003 and saved the file as
greenwich_observatory.osm (you will find it in the data accompanying this chapter).
The downloaded file is actually an XML file. Do have a look what's inside to get an idea of the data
osm2pgsql is dealing with.
In order to take advantage of the OSM tags used to describe the data, we will need the PostgreSQL
hstore extension. Basically it allows for storing key-value pairs in a column, so data with flexible schema can easily be stored. In order to install it, you need to execute the following query in either PgAdmin or psql:
CREATE EXTENSION hstore;
In order to import OSM data, issue the following command, making sure you adjust the paths and db connection details to your environment:
osm2pgsql.exe -H localhost -P 5434 -U postgres -W -d mastering_postgis -S default.style ../data/greenwich_observatory.osm -hstore
If you happen to receive a message such as
Default style not found, please make sure to provide a valid path to the styles definition such as
You should see a similar output:
osm2pgsql SVN version 0.85.0 (64bit id space) Password: Using projection SRS 900913 (Spherical Mercator) Setting up table: planet_osm_point NOTICE: table "planet_osm_point" does not exist, skipping NOTICE: table "planet_osm_point_tmp" does not exist, skipping Setting up table: planet_osm_line NOTICE: table "planet_osm_line" does not exist, skipping NOTICE: table "planet_osm_line_tmp" does not exist, skipping Setting up table: planet_osm_polygon NOTICE: table "planet_osm_polygon" does not exist, skipping NOTICE: table "planet_osm_polygon_tmp" does not exist, skipping Setting up table: planet_osm_roads NOTICE: table "planet_osm_roads" does not exist, skipping NOTICE: table "planet_osm_roads_tmp" does not exist, skipping Using built-in tag processing pipeline Allocating memory for sparse node cache Node-cache: cache=800MB, maxblocks=0*102400, allocation method=8192 Mid: Ram, scale=100 !! You are running this on 32bit system, so at most !! 3GB of RAM can be used. If you encounter unexpected !! exceptions during import, you should try running in slim !! mode using parameter -s. Reading in file: ../data/greenwich_observatory.osm Processing: Node(4k 4.7k/s) Way(0k 0.55k/s) Relation(41 41.00/s) parse time: 0s Node stats: total(4654), max(4268388189) in 0s Way stats: total(546), max(420504897) in 0s Relation stats: total(41), max(6096780) in 0s Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Writing relation (41) Sorting data and creating indexes for planet_osm_point Analyzing planet_osm_point finished Sorting data and creating indexes for planet_osm_line Sorting data and creating indexes for planet_osm_polygon Analyzing planet_osm_line finished node cache: stored: 4654(100.00%), storage efficiency: 50.00% (dense blocks: 0, sparse nodes: 4654), hit rate: 2.00% Sorting data and creating indexes for planet_osm_roads Analyzing planet_osm_polygon finished Analyzing planet_osm_roads finished Copying planet_osm_point to cluster by geometry finished Creating geometry index on planet_osm_point Creating indexes on planet_osm_point finished All indexes on planet_osm_point created in 0s Completed planet_osm_point Copying planet_osm_line to cluster by geometry finished Creating geometry index on planet_osm_line Creating indexes on planet_osm_line finished Copying planet_osm_polygon to cluster by geometry finished Creating geometry index on planet_osm_polygon All indexes on planet_osm_line created in 0s Completed planet_osm_line Creating indexes on planet_osm_polygon finished Copying planet_osm_roads to cluster by geometry finished Creating geometry index on planet_osm_roads All indexes on planet_osm_polygon created in 0s Completed planet_osm_polygon Creating indexes on planet_osm_roads finished All indexes on planet_osm_roads created in 0s Completed planet_osm_roads Osm2pgsql took 1s overall
At this stage, you should have the OSM data imported to the public schema. Thanks to using the
hstore datatype for tags column, we can now do the following type of queries:
select name FROM planet_osm_point where ((tags->'memorial') = 'stone');
When executed in psql with the dataset used in this example, you should see the following output:
name ----------------------------- Prime Meridian of the World (1 row)
Since PostgreSQL 9.1, one can use Foreign Data Wrappers (FDW) in order to connect to the external data sources that are then treated as they were local tables. More information can be found on the PostgreSQL wiki: https://wiki.postgresql.org/wiki/Foreign_data_wrappers.
Querying the external files or databases seems to be standard these days. But how about querying WFS services or OSM directly? Now, this sounds cool, doesn't it? You should certainly have a look at some of the clever GEO data wrappers:
In this example, we'll use
ogr_fdw to connect to some external data sources. Starting with PostGIS 2.2, it is a part of the bundle and there is no need to install it as it should already be available.
First we need to create a server:
CREATE SERVER fdw_sqlserver_test FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource 'MSSQL:server=CM_DOM\MSSQLSERVER12;database=hgis;UID=postgres_fdw;PWD=postgres_fdw', format 'MSSQLSpatial');
If you're using Postgre SQL 9.5+, you can use the
IMPORT SCHEMA command:
IMPORT FOREIGN SCHEMA "dbo.Wig100_skorowidz" FROM SERVER fdw_sqlserver_test INTO data_linked;
Otherwise you will have to specify the table schema explicitly:
CREATE FOREIGN TABLE data_linked.dbo_wig100_skorowidz (fid integer , geom public.geometry , oid integer , gid integer , version integer , godlo character varying , nazwa character varying , nazwa2 character varying , kalibracja character varying , pas real , slup real ) SERVER fdw_sqlserver_test OPTIONS (layer 'dbo.Wig100_skorowidz');
By default, PgAdmin does not display foreign tables, so you may have to go to
Options and tick the
Foreign Tables checkbox in the Browser node. In PgAdmin 4, foreign tables seem to be visible by default.
At this stage, you should be able to query the foreign table as if it was local.
This example is based on the
ogr_fwd documentation, so it only shows the required stuff. A full example can be reviewed here:
First let's create a foreign server:
CREATE SERVER fdw_wfs_test_opengeo FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource 'WFS:http://demo.opengeo.org/geoserver/wfs', format 'WFS');
Automagically bring in the schema:
IMPORT FOREIGN SCHEMA "topp:tasmania_cities" FROM SERVER fdw_wfs_test_opengeo INTO data_linked;
And issue a query against the foreign WFS table:
select city_name from data_linked.topp_tasmania_cities;
Since this dataset contains only one record, our result should be Hobart.
raster2pgsql is the default tool for importing rasters to PostGIS. Even though GDAL itself does not provide means to load rasters to the database,
raster2pgsql is compiled as a part of PostGIS and therefore supports the very same formats as the GDAL version appropriate for given version of PostGIS.
raster2pgsql is a command-line tool. In order to review its parameters, simply type in the console:
While taking a while to get familiar with the
raster2pgsql help is an advised approach, here are some params that worth highlighting:
-G: Prints a list of GDAL formats supported by the given version of the utility; the list is likely to be extensive.
-s: Sets the SRID of the imported raster.
-t: Tile size - expressed as width x height. If not provided, a default is worked out automatically in the range of 32-100 so it best matches the raster dimensions. It is worth remembering that when importing multiple files, tiles will be computed for the first raster and then applied to others.
-P: Pads tiles right / bottom, so all the tiles have the same dimensions.
-d|a|c|p: These options are mutually exclusive:
d: Drops and creates a table.
a: Appends data to an existing table.
c: Creates a new table.
p: Turns on prepare mode. So no importing is done; only a table is created.
-F: A column with raster name will be added.
-l: Comma-separated overviews; creates overview tables named
-I: Creates GIST spatial index on the raster column.
-C: Sets the standard constraints on the raster column after the raster is imported.
For the examples used in this section, we'll use Natural Earth's 50M Gray Earth raster.
As you remember,
ogr2ogr has a ogrinfo tool that can be used to obtain the information on a vector dataset. GDAL's equivalent for raster files is called
gdalinfo and is as worthy as its vector brother:
You should get a similar output:
Driver: GTiff/GeoTIFF Files: GRAY_50M_SR_OB.tif GRAY_50M_SR_OB.tfw Size is 10800, 5400 Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433], AUTHORITY["EPSG","4326"]] Origin = (-179.999999999999970,90.000000000000000) Pixel Size = (0.033333333333330,-0.033333333333330) Metadata: AREA_OR_POINT=Area TIFFTAG_DATETIME=2014:10:18 09:28:20 TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch) TIFFTAG_SOFTWARE=Adobe Photoshop CC 2014 (Macintosh) TIFFTAG_XRESOLUTION=342.85699 TIFFTAG_YRESOLUTION=342.85699 Image Structure Metadata: INTERLEAVE=BAND Corner Coordinates: Upper Left (-180.0000000, 90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N) Lower Left (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S) Upper Right ( 180.0000000, 90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N) Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S) Center ( -0.0000000, 0.0000000) ( 0d 0' 0.00"W, 0d 0' 0.00"N) Band 1 Block=10800x1 Type=Byte, ColorInterp=Gray
Before we get down to importing the raster, let's splits into four parts using
gdalwarp utility. This way, we'll be able to show how to import a single raster and a set of rasters:
gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -te -180 -90 0 0 GRAY_50M_SR_OB.tif gray_50m_partial_bl.tif gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -te -180 0 0 90 GRAY_50M_SR_OB.tif gray_50m_partial_tl.tif gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -te 0 -90 180 0 GRAY_50M_SR_OB.tif gray_50m_partial_br.tif gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -te 0 0 180 90 GRAY_50M_SR_OB.tif gray_50m_partial_tr.tif
For each command, you should see a similar output:
Creating output file that is 5400P x 2700L. Processing input file GRAY_50M_SR_OB.tif. 0...10...20...30...40...50...60...70...80...90...100 - done.
Having prepared the data, we can now move onto importing it.
In order to import a single raster file, let's issue the following command:
raster2pgsql -s 4326 -C -l 2,4 -I -F -t 2700x2700 gray_50m_sr_ob.tif data_import.gray_50m_sr_ob | psql -h localhost -p 5434 -U postgres -d mastering_postgis
You should see a similar output:
Processing 1/1: gray_50m_sr_ob.tif BEGIN CREATE TABLE CREATE TABLE CREATE TABLE INSERT 0 1 (...) INSERT 0 1 CREATE INDEX ANALYZE CREATE INDEX ANALYZE CREATE INDEX ANALYZE NOTICE: Adding SRID constraint CONTEXT: PL/pgSQL function addrasterconstraints(name,name,name,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean) line 53 at RETURN NOTICE: Adding scale-X constraint (...) ---------------------- t (1 row) addoverviewconstraints ------------------------ t (1 row) addoverviewconstraints ------------------------ t (1 row) COMMIT
The executed command created 3 tables: the main raster table called
data_import.gray_50m_sr_ob and two overview tables called
data_import.o_4_gray_50m_sr_ob. The command also created the GIST index and brought in the filename. The raster has been split into tiles of 2700 x 2700 pixels.
Let's import a directory of rasters now. We have four files with the file name mask
gray_50m_partial*.tif. In order to import all the files at once, we'll issue the following command:
raster2pgsql -s 4326 -C -l 2,4 -I -F -t 2700x2700 gray_50m_partial*.tif data_import.gray_50m_partial | psql -h localhost -p 5434 -U postgres -d mastering_postgis
You should see a similar output:
Processing 1/4: gray_50m_partial_bl.tif BEGIN CREATE TABLE CREATE TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 Processing 2/4: gray_50m_partial_br.tif (...) Processing 3/4: gray_50m_partial_tl.tif (...) Processing 4/4: gray_50m_partial_tr.tif (...) CONTEXT: PL/pgSQL function addrasterconstraints(name,name,name,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean) line 53 at RETURN NOTICE: Adding maximum extent constraint CONTEXT: PL/pgSQL function addrasterconstraints(name,name,name,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean,boolean) line 53 at RETURN addrasterconstraints ---------------------- t (1 row) addoverviewconstraints ------------------------ t (1 row) addoverviewconstraints ------------------------ t (1 row) COMMIT
The command used to import multiple rasters was very similar to the one we used to import a single file. The difference was a filename mask used in place of a filename:
gray_50m_partial*.tif. If we had used a bit more griddy pattern such as
*.tif, all the TIF files present in a directory would be imported.
When processing multiple files, one can pipe the output to psql without the connection info specified as psql params, but in such a case, equivalent environment variables will have to be set (on Windows, use the
set command, and on Linux,
set PGPORT=5434 set PGHOST=localhost set PGUSER=postgres set PGPASSWORD=somepass set PGDATABASE=mastering_postgis raster2pgsql -s 4326 -C -l 2,4 -I -F -t 2700x2700 gray_50m_partial*.tif data_import.gray_50m_partial | psql
Just to make the data import complete, it is worth mentioning the restore command. After all, it is not very an uncommon scenario to receive some data in the form of a database, schema, or even a single table backup.
For this scenario, let's create a backup of one of the tables imported before:
pg_dump -h localhost -p 5434 -U postgres -t data_import.earthquakes_subset_with_geom -c -F c -v -b -f earthquakes_subset_with_geom.backup mastering_postgis
Since there was a
-v option specified, you should get a similarly verbose output:
pg_dump: reading schemas pg_dump: reading user-defined tables pg_dump: reading extensions pg_dump: reading user-defined functions pg_dump: reading user-defined types pg_dump: reading procedural languages pg_dump: reading user-defined aggregate functions pg_dump: reading user-defined operators pg_dump: reading user-defined operator classes pg_dump: reading user-defined operator families pg_dump: reading user-defined text search parsers pg_dump: reading user-defined text search templates pg_dump: reading user-defined text search dictionaries pg_dump: reading user-defined text search configurations pg_dump: reading user-defined foreign-data wrappers pg_dump: reading user-defined foreign servers pg_dump: reading default privileges pg_dump: reading user-defined collations pg_dump: reading user-defined conversions pg_dump: reading type casts pg_dump: reading transforms pg_dump: reading table inheritance information pg_dump: reading event triggers pg_dump: finding extension members pg_dump: finding inheritance relationships pg_dump: reading column info for interesting tables pg_dump: finding the columns and types of table "data_import.earthquakes_subset_with_geom" pg_dump: flagging inherited columns in subtables pg_dump: reading indexes pg_dump: reading constraints pg_dump: reading triggers pg_dump: reading rewrite rules pg_dump: reading policies pg_dump: reading row security enabled for table "data_import.earthquakes_subset_with_geom" pg_dump: reading policies for table "data_import.earthquakes_subset_with_geom" pg_dump: reading large objects pg_dump: reading dependency data pg_dump: saving encoding = UTF8 pg_dump: saving standard_conforming_strings = on pg_dump: dumping contents of table "data_import.earthquakes_subset_with_geom"
Having backed up our table, let's drop the original one:
DROP TABLE data_import.earthquakes_subset_with_geom;
And see if we can restore it:
pg_restore -h localhost -p 5434 -U postgres -v -d mastering_postgis earthquakes_subset_with_geom.backup
You should see a similar output:
pg_restore: connecting to database for restore pg_restore: creating TABLE "data_import.earthquakes_subset_with_geom" pg_restore: processing data for table "data_import.earthquakes_subset_with_geom" pg_restore: setting owner and privileges for TABLE "data_import.earthquakes_subset_with_geom" pg_restore: setting owner and privileges for TABLE DATA "data_import.earthquakes_subset_with_geom"
At this stage, we have successfully imported data by using the PostgreSQL backup / restore facilities.
There are many different ways of importing data into a PostGIS database. It is more than likely that you will not have to use all of them all the time but rather have your preferred ways of loading the data in your standard routines.
It is worth knowing different tools, though considered as different scenarios, may have to be addressed with special care. When you add scripting to the stack (see the chapter on ETL), you are equipped with some simple yet very powerful tools that may constitute a decent ETL toolbox.
Having fed our database with data, next, we'll have a look at spatial data analysis and find our way through a rich function set of PostGIS.