Moving Spatial Data From One Format to Another

Michael Diener

November 2015

In this article by Michael Diener, author of the Python Geospatial Analysis Cookbook, we will cover the following topics:

  • Converting a Shapefile to a PostGIS table using ogr2ogr
  • Batch importing a folder of Shapefiles into PostGIS using ogr2ogr
  • Batch exporting a list of tables from PostGIS to Shapefiles
  • Converting an OpenStreetMap (OSM) XML to a Shapefile
  • Converting a Shapefile (vector) to a GeoTiff (raster)
  • Converting a GeoTiff (raster) to a Shapefile (vector) using GDAL

(For more resources related to this topic, see here.)

Introduction

Geospatial data comes in hundreds of formats and massaging this data from one format to another is a simple task. The ability to convert data types, such as rasters or vectors, belongs to data wrangling tasks that are involved in geospatial analysis. Here is an example of a raster and vector dataset so you can see what I am talking about:

Source: Michael Diener drawing

The best practice is to run analysis functions or models on data stored in a common format, such as a Postgresql PostGIS database or a set of Shapefiles, in a common coordinate system. For example, running analysis on input data stored in multiple formats is also possible, but you can expect to find the devil in the details of your results if something goes wrong or your results are not what you expect.

This article looks at some common data formats and demonstrates how to move these formats around from one to another with the most common tools.

Converting a Shapefile to a PostGIS table using ogr2ogr

The simplest way to transform data from one format to another is to directly use the ogr2ogr tool that comes with the installation of GDAL. This powerful tool can convert over 200 geospatial formats. In this solution, we will execute the ogr2ogr utility from within a Python script to execute generic vector data conversions. The python code is, therefore, used to execute this command-line tool and pass around variables that are needed to create your own scripts for data imports or exports.

The use of this tool is also recommended if you are not really interested in coding too much and simply want to get the job done to move your data. A pure python solution is, of course, possible but it is definitely targeted more at developers (or python purists).

Getting ready

To run this script, you will need the GDAL utility application installed on your system. Windows users can visit OSGeo4W (http://trac.osgeo.org/osgeo4w) and download the 32-bit or 64-bit Windows installer as follows:

  1. Simply double-click on the installer to start it.
  2. Navigate to the bottommost option, Advanced Installation | Next.
  3. Click on Next to download from the Internet (this is the first default option).
  4. Click on Next to accept default location of path or change to your liking.
  5. Click on Next to accept the location of local saved downloads (default).
  6. Click on Next to accept the direct connection (default).
  7. Click on Next to select a default download site.
  8. Now, you should finally see the menu. Click on + to open the command-line utilities and you should see the following:

  9. Now, select gdal. The GDAL/OGR library and command line tools to install it.
  10. Click on Next to start downloading it, and then install it.

For Ubuntu Linux users, use the following steps for installation:

  1. Execute this simple one-line command:
    $ sudo apt-get install gdal-bin
  2. This will get you up and running so that you can execute ogr2ogr directly from your terminal. Next, set up your Postgresql database using the PostGIS extension.
  3. First, we will create a new user to manage our new database and tables:
    Sudo su createuser  –U postgres –P pluto
  4. Enter a password for the new role.
  5. Enter the password again for the new role.
  6. Enter a password for postgres users since you're going to create a user with the help of the postgres user.
    The –P option prompts you to give the new user, called pluto, a password. For the following examples, our password is stars; I would recommend a much more secure password for your production database.

Setting up your Postgresql database with the PostGIS extension in Windows is the same as setting it up in Ubuntu Linux. Perform the following steps to do this:

  1. Navigate to the c:\Program Files\PostgreSQL\9.3\bin\ folder. Then, execute this command and follow the on-screen instructions as mentioned previously:
    Createuser.exe –U postgres –P pluto
  2. To create the database, we will use the command-line createdb command similar to the postgres user to create a database named py_geoan_cb. We will then assign the pluto user to be the database owner; here is the command to do this:
    $ sudo su createdb –O pluto –U postgres py_geoan_cb
  3. Windows users can visit the c:\Program Files\PostgreSQL\9.3\bin\ and execute the createdb.exe command:
    createdb.exe –O pluto –U postgres py_geoan_cb
  4. Next, create the PostGIS extension for our newly created database:
    psql –U postgres -d py_geoan_cb -c "CREATE EXTENSION postgis;"
  5. Windows users can also execute psql from within the c:\Program Files\PostgreSQL\9.3\bin\ folder:
    psql.exe –U postgres –d py_geoan_cb –c "CREATE EXTENSION postgis;"

Lastly, create a schema called geodata to store the new spatial table. It is common to store spatial data in another schema outside the Postgresql default schema, public. Create the schema as follows:

  • For Ubuntu Linux users:
    sudo -u postgres psql -d py_geoan_cb -c "CREATE SCHEMA geodata 
    AUTHORIZATION pluto;"
  • For Windows users:
    psql.exe –U postgres –d py_geoan_cb –c "CREATE SCHEMA geodata 
    AUTHORIZATION pluto;"

How to do it...

  1. Now let's get into the actual importing of our Shapefile into a PostGIS database that will automatically create a new table from our Shapefile:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import subprocess
    
    # database options
    db_schema = "SCHEMA=geodata"
    overwrite_option = "OVERWRITE=YES"
    geom_type = "MULTILINESTRING"
    output_format = "PostgreSQL"
    
    # database connection string
    db_connection = """PG:host=localhost port=5432
      user=pluto dbname=py_test password=stars"""
    
    # input shapefile
    input_shp = "../geodata/bikeways.shp"
    
    # call ogr2ogr from python
    subprocess.call(["ogr2ogr","-lco", db_schema, "-lco", overwrite_option, "-nlt", geom_type, "-f", output_format, db_connection,  input_shp])
  2. Now we can call our script from the command line:
    $ python ch03-01_shp2pg.py

How it works...

We begin with importing the standard python module subprocess that will call the ogr2ogr command-line tool. Next, we'll set a range of variables that are used as input arguments and various options for ogr2ogr to execute.

Starting with the SCHEMA=geodata Postgresql database, we'll set a nondefault database schema for the destination of our new table. It is best practice to store your spatial data tables in a separate schema outside the "public" schema, which is set as the default. This practice will make backups and restores much easier and keep your database better organized.

Next, we'll create a overwrite_option variable that's set to "yes" so that we can overwrite any table with the same name when its created. This is helpful when you want to completely replace the table with new data, otherwise, it is recommended to use the -append option. We'll also specify the geometry type because, sometimes, ogr2ogr does not always guess the correct geometry type of our Shapefile so setting this value saves you any worry.

Now, setting our output_format variable with the PostgreSQL keyword tells ogr2ogr that we want to output data into a Postgresql database. This is then followed by the db_connection variable, which specifies our database connection information. Do not forget that the database must already exist along with the "geodata" schema, otherwise, we will get an error.

The last input_shp variable gives the full path to our Shapefile, including the .shp file ending. Now, we will call the subprocess module ,which will then call the ogr2ogr command-line tool and pass along the variable options required to run the tool. We'll pass this function an array of arguments, the first object in the array being the ogr2ogr command-line tool name. After this, we'll pass each option after another in the array to complete the call.

Subprocess can be used to call any command-line tool directly. It takes a list of parameters that are separated by spaces. This passing of parameters is quite fussy, so make sure you follow along closely and don't add any extra spaces or commas.

Last but not least, we need to execute our script from the command line to actually import our Shapefile by calling the python interpreter and passing the script. Then, head over to the PgAdmin Postgresql database viewer and see if it worked, or even better, open up Quantum GIS (www.qgis.org) and take a look at the newly created tables.

See also

If you would like to see the full list of options available with the ogr2ogr command, simple enter the following in the command line:

$ ogr2ogr –help

You will see the full list of options that are available. Also, visit http://gdal.org/ogr2ogr.html to read the required documentation.

Batch importing a folder of Shapefiles into PostGIS using ogr2ogr

We would like to extend our last script to loop over a folder full of Shapefiles and import them into PostGIS. Most importing tasks involve more than one file to import so this makes it a very practical task.

How to do it...

The following steps will batch import a folder of Shapefiles into PostGIS using ogr2ogr:

  1. Our script will reuse the previous code in the form of a function so that we can batch process a list of Shapefiles to import into the Postgresql PostGIS database. We will create our list of Shapefiles from a single folder for the sake of simplicity:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import subprocess
    import os
    import ogr
    
    
    def discover_geom_name(ogr_type):
        """
    
        :param ogr_type: ogr GetGeomType()
        :return: string geometry type name
        """
        return {ogr.wkbUnknown            : "UNKNOWN",
                ogr.wkbPoint              : "POINT",
                ogr.wkbLineString         : "LINESTRING",
                ogr.wkbPolygon            : "POLYGON",
                ogr.wkbMultiPoint         : "MULTIPOINT",
                ogr.wkbMultiLineString    : "MULTILINESTRING",
                ogr.wkbMultiPolygon       : "MULTIPOLYGON",
                ogr.wkbGeometryCollection : "GEOMETRYCOLLECTION",
                ogr.wkbNone               : "NONE",
                ogr.wkbLinearRing         : "LINEARRING"}.get(ogr_type)
    
    def run_shp2pg(input_shp):
        """
        input_shp is full path to shapefile including file ending
        usage:  run_shp2pg('/home/geodata/myshape.shp')
        """
    
        db_schema = "SCHEMA=geodata"
        db_connection = """PG:host=localhost port=5432
                        user=pluto dbname=py_geoan_cb password=stars"""
        output_format = "PostgreSQL"
        overwrite_option = "OVERWRITE=YES"
        shp_dataset = shp_driver.Open(input_shp)
        layer = shp_dataset.GetLayer(0)
        geometry_type = layer.GetLayerDefn().GetGeomType()
        geometry_name = discover_geom_name(geometry_type)
        print (geometry_name)
    
        subprocess.call(["ogr2ogr", "-lco", db_schema, "-lco", overwrite_option,
                         "-nlt", geometry_name, "-skipfailures",
                         "-f", output_format, db_connection, input_shp])
    
    # directory full of shapefiles
    shapefile_dir = os.path.realpath('../geodata')
    
    # define the ogr spatial driver type
    shp_driver = ogr.GetDriverByName('ESRI Shapefile')
    
    # empty list to hold names of all shapefils in directory
    shapefile_list = []
    
    for shp_file in os.listdir(shapefile_dir):
        if shp_file.endswith(".shp"):
            # apped join path to file name to outpout "../geodata/myshape.shp"
            full_shapefile_path = os.path.join(shapefile_dir, shp_file)
            shapefile_list.append(full_shapefile_path)
    
    # loop over list of Shapefiles running our import function
    for each_shapefile in shapefile_list:
        run_shp2pg(each_shapefile) 
     print ("importing Shapefile: " + each_shapefile)
  2. Now, we can simply run our new script from the command line once again:
    $ python ch03-02_batch_shp2pg.py

How it works...

Here, we will reuse our code from the previous script but have converted it into a python function called run_shp2pg (input_shp), which takes exactly one argument to complete the path to the Shapefile we want to import. The input argument must include a Shapefile ending with .shp.

We have a helper function that will get the geometry type as a string by reading in the Shapefile feature layer and outputting the geometry type so that the ogr commands know what to expect. This does not always work and some errors can occur. The skipfailures option will plow over any errors that are thrown during insert and will still populate our tables.

To begin with, we need to define the folder that contains all our Shapefiles to be imported. Next up, we'll create an empty list object called shapefile_list that will hold a list of all the Shapefiles we want to import.

The first for loop is used to get the list of all the Shapefiles in the directory specified using the standard python os.listdir() function. We do not want all the files in this folder; we only want files with ending with .shp, hence, the if statement will evaluate to True if the file ends with .shp. Once the .shp file is found, we need to append the file path to the file name to create a single string that holds the path plus the Shapefile name, and this is our variable called full_shapefile_path. The final part of this is to add each new file with its attached path to our shapefile_list list object. So, we now have our final list to loop through.

It is time to loop through each Shapefile in our new list and run our run_shp2pg(input_shp) function for each Shapefile in the list by importing it into our Postgresql PostGIS database.

See also

If you have a lot of Shapefiles, and by this I mean mean hundred or more Shapefiles, performance will be one consideration and will, therefore, indicate that there are a lot of machines with free resources.

Batch exporting a list of tables from PostGIS to Shapefiles

We will now change directions and take a look at how we can batch export a list of tables from our PostGIS database into a folder of Shapefiles. We'll again use the ogr2ogr command-line tool from within a python script so that you can include it in your application programming workflow. Near the end, you can also see how this all of works in one single command line.

How to do it...

  1. The script will fire the ogr2ogr command and loop over a list of tables to export the Shapefile format into an existing folder. So, let's take a look at how to do this using the following code:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    #
    import subprocess
    import os
    
    # folder to hold output Shapefiles
    destination_dir = os.path.realpath('../geodata/temp')
    
    # list of postGIS tables
    postgis_tables_list = ["bikeways", "highest_mountains"]
    
    # database connection parameters
    db_connection = """PG:host=localhost port=5432 user=pluto
            dbname=py_geoan_cb password=stars active_schema=geodata"""
    
    output_format = "ESRI Shapefile"
    
    # check if destination directory exists
    if not os.path.isdir(destination_dir):
        os.mkdir(destination_dir)
        for table in postgis_tables_list:
            subprocess.call(["ogr2ogr", "-f", output_format, destination_dir,
                             db_connection, table])
            print("running ogr2ogr on table: " + table)
    else:
        print("oh no your destination directory " + destination_dir +
              " already exist please remove it then run again")
    
    # commandline call without using python will look like this
    # ogr2ogr -f "ESRI Shapefile" mydatadump \
    # PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" 
      neighborhood parcels
  2. Now, we'll call our script from the command line as follows:
    $ python ch03-03_batch_postgis2shp.py

How it works...

Beginning with a simple import of our subprocess and os modules, we'll immediately define our destination directory where we want to store the exported Shapefiles. This variable is followed by the list of table names that we want to export. This list can only include files located in the same Postgresql schema. The schema is defined as the active_schema so that ogr2ogr knows where to find the tables to be exported.

Once again, we'll define the output format as ESRI Shapefile. Now, we'll check whether the destination folder exists. If it does, continue and call our loop. Then, loop through the list of tables stored in our postgis_tables_list variable. If the destination folder does not exist, you will see an error printed on the screen.

There's more...

If you are programming an application, then executing the ogr2ogr command from inside your script is definitely quick and easy. On the other hand, for a one-off job, simply executing the command-line tool is what you want when you export your list of Shapefiles. To do this in a one-liner, please take a look at the following information box.

A one line example of calling the ogr2ogr batch of the PostGIS table to Shapefiles is shown here if you simply want to execute this once and not in a scripting environment:

ogr2ogr -f "ESRI Shapefile" /home/ch03/geodata/temp
PG:"host=localhost user=pluto dbname=py_geoan_cb password=stars"
bikeways highest_mountains

The list of tables you want to export is located as a list separated by spaces. The destination location of the exported Shapefiles is ../geodata/temp. Note this this /temp directory must exist.

Converting an OpenStreetMap (OSM) XML to a Shapefile

OpenStreetMap (OSM) has a wealth of free data, but to use it with most other applications, we need to convert it to another format, such as a Shapefile or Postgresql PostGIS database. This recipe will use the ogr2ogr tool to do the conversion for us within a python script. The benefit derived here is simplicity.

Getting ready

To get started, you will need to download the OSM data at http://www.openstreetmap.org/export#map=17/37.80721/-122.47305 and saving the file (.osm) to your /ch03/geodata directory. The download button is located on the bar on the left-hand side, and when pressed, it should immediately start the download (refer to the following image). The area we are testing is from San Francisco, just before the golden gate bridge.

If you choose to download another area from OSM feel free to, but make sure that you take a small area like preceding example link. If you select a larger area, the OSM web tool will give you a warning and disable the download button. The reason is simple: if the dataset is very large, it will most likely be better suited for another tool, such as osm2pgsql (http://wiki.openstreetmap.org/wiki/Osm2pgsql), for you conversion. If you need to get OSM data for a large area and want to export to Shapefile, it would be advisable to use another tool, such as osm2pgsql, which will first import your data to a Postgresql database. Then, export from the PostGIS database to Shapefile using the pgsql2shp tool.

 A python tool to import OSM data into a PostGIS database is also available and is called imposm (located here at http://imposm.org/). Version 2 of it is written in python and version 3 is written in the "go" programming language if you want to give it a try.

How to do it...

  1. Using the subprocess module, we will execute ogr2ogr to convert the OSM data that we downloaded into a new Shapefile:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    # convert / import osm xml .osm file into a Shapefile
    import subprocess
    import os
    import shutil
    
    # specify output format
    output_format = "ESRI Shapefile"
    
    # complete path to input OSM xml file .osm
    input_osm = '../geodata/OSM_san_francisco_westbluff.osm'
    
    # Windows users can uncomment these two lines if needed
    # ogr2ogr = r"c:/OSGeo4W/bin/ogr2ogr.exe"
    # ogr_info = r"c:/OSGeo4W/bin/ogrinfo.exe"
    
    # view what geometry types are available in our OSM file
    subprocess.call([ogr_info, input_osm])
    
    destination_dir = os.path.realpath('../geodata/temp')
    
    if os.path.isdir(destination_dir):
        # remove output folder if it exists
        shutil.rmtree(destination_dir)
        print("removing existing directory : " + destination_dir)
        # create new output folder
        os.mkdir(destination_dir)
        print("creating new directory : " + destination_dir)
    
        # list of geometry types to convert to Shapefile
        geom_types = ["lines", "points", "multilinestrings", "multipolygons"]
    
        # create a new Shapefile for each geometry type
        for g_type in geom_types:
    
            subprocess.call([ogr2ogr,
                   "-skipfailures", "-f", output_format,
                     destination_dir, input_osm,
                     "layer", g_type,
                     "--config","OSM_USE_CUSTOM_INDEXING", "NO"])
            print("done creating " + g_type)
    
    # if you like to export to SPATIALITE from .osm
    # subprocess.call([ogr2ogr, "-skipfailures", "-f",
    #         "SQLITE", "-dsco", "SPATIALITE=YES",
    #         "my2.sqlite", input_osm])
  2. Now we can call our script from the command line:
    $ python ch03-04_osm2shp.py
    Go and have a look at your ../geodata folder to see the newly created Shapefiles, and try to open them up in Quantum GIS, which is a free GIS software (www.qgis.org)

How it works...

This script should be clear as we are using the subprocess module call to fire our ogr2ogr command-line tool. Specify our OSM dataset as an input file, including the full path to the file. The Shapefile name is not supplied as ogr2ogr and will output a set of Shapefiles, one for each geometry shape according to the geometry type it finds inside the OSM file. We only need to specify the name of the folder where we want ogr2ogr to export the Shapefiles to, automatically creating the folder if it does not exist.

Windows users: if you do not have your ogr2ogr tool mapped to your environment variables, you can simply uncomment at lines 16 and 17 in the preceding code and replace the path shown with the path on your machine to the Windows executables.

The first subprocess call prints out the screen that the geometry types have found inside the OSM file. This is helpful in most cases to help you identify what is available. Shapefiles can only support one geometry type per file, and this is why ogr2ogr outputs a folder full of Shapefiles, each one representing a separate geometry type.

Lastly, we'll call subprocess to execute ogr2ogr, passing in the output "ESRI Shapefile" file type, output folder, and the name of the OSM dataset.

Converting a Shapefile (vector) to a GeoTiff (raster)

Moving data from format to format also includes moving from vector to raster or the other way around. In this recipe, we move from a vector (Shapefile) to a raster (GeoTiff) with the python gdal and ogr modules.

Getting ready

We need to be inside our virtual environment again, so fire it up so that we can access our gdal and ogr python modules.

As usual, enter your python virtual environment with the workon pygeoan_cb command:

$ source venvs/pygeoan_cb/bin/activate

How to do it...

Let's dive in and convert our golf course polygon Shapefile into a GeoTif; here is the code to do this:

  1. Import the ogr and gdal libraries, and then define the output pixel size along with the value that will be assigned to null:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    from osgeo import ogr
    from osgeo import gdal
    
    # set pixel size
    pixel_size = 1
    no_data_value = -9999
  2. Set up the input Shapefile that we want to convert alongside the new GeoTiff raster that will be created when the script is executed:
    # Shapefile input name
    # input projection must be in cartesian system in meters
    # input wgs 84 or EPSG: 4326 will NOT work!!!
    input_shp = r'../geodata/ply_golfcourse-strasslach3857.shp'
    
    # TIF Raster file to be created
    output_raster = r'../geodata/ply_golfcourse-strasslach.tif'
  3. Now we need to create the input Shapefile object, so get the layer information and finally set the extent values:
    # Open the data source get the layer object
    # assign extent coordinates
    open_shp = ogr.Open(input_shp)
    shp_layer = open_shp.GetLayer()
    x_min, x_max, y_min, y_max = shp_layer.GetExtent()
  4. Here, we need to calculate the resolution distance to pixel value:
    # calculate raster resolution
    x_res = int((x_max - x_min) / pixel_size)
    y_res = int((y_max - y_min) / pixel_size)
  5. Our new raster type is a GeoTiff so we must explicitly tell this gdal to get the driver. The driver is then able to create a new GeoTiff by passing in the filename or the new raster we want to create. The x direction resolution is followed by the y direction resolution, and then our number of bands, which is, in this case, 1. Lastly, we'll set the new type of GDT_Byte raster:
    # set the image type for export
    
    image_type = 'GTiff'
    
    driver = gdal.GetDriverByName(image_type)
    
     
    
    new_raster = driver.Create(output_raster, x_res, y_res, 1, gdal.GDT_Byte)
    
    new_raster.SetGeoTransform((x_min, pixel_size, 0, y_max, 0, -
      pixel_size))
  6. Now we can access the new raster band and assign the no data values and the inner data values for the new raster. All the inner values will receive a value of 255 similar to what we set for the burn_values variable:
    # get the raster band we want to export too
    raster_band = new_raster.GetRasterBand(1)
    
    # assign the no data value to empty cells
    raster_band.SetNoDataValue(no_data_value)
    
    # run vector to raster on new raster with input Shapefile
    gdal.RasterizeLayer(new_raster, [1], shp_layer, burn_values=[255])
  7. Here we go! Lets run the script to see what our new raster looks like:
    $ python ch03-05_shp2raster.py
    The resulting raster should look like this if you open it  using QGIS (http://www.qgis.org):

How it works...

There are several steps involved in this code so please follow along as some points could lead to trouble if you are not sure what values to input. We start with the import of the gdal and ogr modules, respectively, since they will do the work for us by inputting a Shapefile (vector) and outputting a GeoTiff (raster).

The pixel_size variable is very important since it will determine the size of the new raster we will create. In this example, we only have two polygons, so we'll set pixel_size = 1 to keep a fine border. If you have many polygons stretching across the globe in one Shapefile, it is wiser to set this value to 25 or more. Otherwise, you could end up with a 10 GB raster and your machine will run all night long! The no_data_value parameter is needed to tell GDAL what values to set in the empty space around our input polygons, and we set these to -9999 in order to be easily identified.

Next, we'll simply set the input Shapefile stored in the EPSG:3857 web mercator and output GeoTiff. Check to make sure that you change the file names accordingly if you want to use some other dataset. We start by working with the OGR module to open the Shapefile and retrieve its layer information and the extent information. The extent is important because it is used to calculate the size of the output raster width and height values, which must be integers that are represented by the x_res and y_res variables.

Note that the projection of your Shapefile must be in meters not degrees. This is very important since this will NOT work in EPSG:4326 or WGS 84, for example. The reason is that the coordinate units are LAT/LON. This means that WGS84 is not a flat plane projection and cannot be drawn as is. Our x_res and y_res values would evaluate to 0 since we cannot get a real ratio using degrees. This is a result of use not being able to simply subtract coordinate x from coordinate y because the units are in degrees and not in a flat plane meter projection.

Now moving on to the raster setup, we'll define the type of raster we want to export as a Gtiff. Then, we can get the correct GDAL driver by the raster type. Once the raster type is set, we can create a new empty raster dataset, passing in the raster file name, width, and height of the raster in pixels, number of raster bands, and finally, the type of raster in GDAL terms, that is, the gdal.GDT_Byte. These five parameters are mandatory to create a new raster.

Next, we'll call SetGeoTransform that handles transforming between pixel/line raster space and projection coordinates space. We want to activate the band 1 as it is the only band we have in our raster. Then, we'll assign the no data value for all our empty space around the polygon.

The final step is to call the gdal.RasterizeLayer() function and pass in our new raster band Shapefile and the value to assign to the inside of our raster. The value of all the pixels inside our polygon will be 255.

See also

If you are interested, you can visit the command-line tool gdal_rasterize at http://www.gdal.org/gdal_rasterize.html. You can run this straight from the command line.

Converting a raster (GeoTiff) to a vector (Shapefile) using GDAL

We have now looked at how we can go from vector to raster, so it is time to go from raster to vector. This method is much more common because most of our vector data is derived from remotely sensed data such as satellite images, orthophotos, or some other remote sensing dataset such as lidar.

Getting ready

As usual, please enter your python virtual environment with the help of the workon pygeoan_cb command:

$ source venvs/pygeoan_cb/bin/activate

How to do it...

Now let's begin:

  1. Import the ogr and gdal modules. Go straight ahead and open the raster that we want to convert by passing it the file name on disk. Then, get the raster band:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    from osgeo import ogr
    from osgeo import gdal
    
    #  get raster datasource
    open_image = gdal.Open( "../geodata/cadaster_borders-2tone-black-
      white.png")
    input_band = open_image.GetRasterBand(3)
  2. Setup the output vector file as a Shapefile with output_shp, and then get the Shapefile driver. Now, we can create the output from our driver and create the layer:
    #  create output datasource
    output_shp = "../geodata/cadaster_raster"
    shp_driver = ogr.GetDriverByName("ESRI Shapefile")
    
    # create output file name
    output_shapefile = shp_driver.CreateDataSource( output_shp + ".shp" )
    new_shapefile = output_shapefile.CreateLayer(output_shp, srs = None )
  3. Our final step is to run the gdal.Polygonize function that does the heavy lifting by converting our raster to vector.
    gdal.Polygonize(input_band, None, new_shapefile, -1, [], callback=None)
    new_shapefile.SyncToDisk()
  4. Execute the new script.
    $ python ch03-06_raster2shp.py

How it works...

Working with ogr and gdal is similar in all our recipes; we must define the inputs and get the appropriate file driver to open the files. The GDAL library is very powerful and in only one line of code, we can convert a raster to a vector with the gdal. Polygonize function. All the preceding code is simply setup code to define which format we want to work with. We can then set up the appropriate driver to input and output our new file.

Summary

In this article we covered converting a Shapefile to a PostGIS table using ogr2ogr, batch importing a folder of Shapefiles into PostGIS using ogr2ogr, batch exporting a list of tables from PostGIS to Shapefiles, converting an OpenStreetMap (OSM) XML to a Shapefile, converting a Shapefile (vector) to a GeoTiff (raster), and converting a GeoTiff (raster) to a Shapefile (vector) using GDAL

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Python Geospatial Analysis Cookbook

Explore Title