# Working with Geo-Spatial Data in Python

December 2010

## Python Geospatial Development

 Build a complete and sophisticated mapping application from scratch using Python tools for GIS development Build applications for GIS development using Python Analyze and visualize Geo-Spatial data Comprehensive coverage of key GIS concepts Recommended best practices for storing spatial data in a database Draw maps, place data points onto a map, and interact with maps A practical tutorial with plenty of step-by-step instructions to help you develop a mapping application from scratch

(For more resources on Python, see here.)

# Prerequisites

If you want to follow through the examples in this article, make sure you have the following Python libraries installed on your computer:

# Reading and writing geo-spatial data

In this section, we will look at some examples of tasks you might want to perform that involve reading and writing geo-spatial data in both vector and raster format.

## Task: Calculate the bounding box for each country in the world

In this slightly contrived example, we will make use of a Shapefile to calculate the minimum and maximum latitude/longitude values for each country in the world. This "bounding box" can be used, among other things, to generate a map of a particular country. For example, the bounding box for Turkey would look like this:

Decompress the .zip archive and place the various files that make up the Shapefile (the .dbf, .prj, .shp, and .shx files) together in a suitable directory.

We next need to create a Python program that can read the borders of each country. Fortunately, using OGR to read through the contents of a Shapefile is trivial:

`import osgeo.ogrshapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")layer = shapefile.GetLayer(0)for i in range(layer.GetFeatureCount()):   feature = layer.GetFeature(i)`

The feature consists of a geometry and a set of fields. For this data, the geometry is a polygon that defines the outline of the country, while the fields contain various pieces of information about the country. According to the Readme.txt file, the fields in this Shapefile include the ISO-3166 three-letter code for the country (in a field named ISO3) as well as the name for the country (in a field named NAME). This allows us to obtain the country code and name like this:

`countryCode = feature.GetField("ISO3")countryName = feature.GetField("NAME")`

We can also obtain the country's border polygon using:

`geometry = feature.GetGeometryRef()`

There are all sorts of things we can do with this geometry, but in this case we want to obtain the bounding box or envelope for the polygon:

`minLong,maxLong,minLat,maxLat = geometry.GetEnvelope()`

Let's put all this together into a complete working program:

`# calcBoundingBoxes.pyimport osgeo.ogrshapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")layer = shapefile.GetLayer(0)countries = [] # List of (code,name,minLat,maxLat,          # minLong,maxLong) tuples.for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  countryCode = feature.GetField("ISO3")   countryName = feature.GetField("NAME")   geometry = feature.GetGeometryRef()   minLong,maxLong,minLat,maxLat = geometry.GetEnvelope()  countries.append((countryName, countryCode,      minLat, maxLat, minLong, maxLong))countries.sort()for name,code,minLat,maxLat,minLong,maxLong in countries:   print "%s (%s) lat=%0.4f..%0.4f, long=%0.4f..%0.4f" \% (name, code,minLat, maxLat,minLong, maxLong)`

Running this program produces the following output:

`% python calcBoundingBoxes.pyAfghanistan (AFG) lat=29.4061..38.4721, long=60.5042..74.9157Albania (ALB) lat=39.6447..42.6619, long=19.2825..21.0542Algeria (DZA) lat=18.9764..37.0914, long=-8.6672..11.9865...`

## Task: Save the country bounding boxes into a Shapefile

While the previous example simply printed out the latitude and longitude values, it might be more useful to draw the bounding boxes onto a map. To do this, we have to convert the bounding boxes into polygons, and save these polygons into a Shapefile.

Creating a Shapefile involves the following steps:

1. Define the spatial reference used by the Shapefile's data. In this case, we'll use the WGS84 datum and unprojected geographic coordinates (that is, latitude and longitude values). This is how you would define this spatial reference using OGR:
`import osgeo.osrspatialReference = osgeo.osr.SpatialReference()spatialReference.SetWellKnownGeogCS('WGS84')`

We can now create the Shapefile itself using this spatial reference:

`import osgeo.ogrdriver = osgeo.ogr.GetDriverByName("ESRI Shapefile")dstFile = driver.CreateDataSource("boundingBoxes.shp"))dstLayer = dstFile.CreateLayer("layer", spatialReference)`
2. After creating the Shapefile, you next define the various fields that will hold the metadata for each feature. In this case, let's add two fields to store the country name and its ISO-3166 code:
`fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString)fieldDef.SetWidth(50)dstLayer.CreateField(fieldDef)fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString)fieldDef.SetWidth(3)dstLayer.CreateField(fieldDef)`
3. We now need to create the geometry for each feature—in this case, a polygon defining the country's bounding box. A polygon consists of one or more linear rings; the first linear ring defines the exterior of the polygon, while additional rings define "holes" inside the polygon. In this case, we want a simple polygon with a square exterior and no holes:
`linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing)linearRing.AddPoint(minLong, minLat)linearRing.AddPoint(maxLong, minLat)linearRing.AddPoint(maxLong, maxLat)linearRing.AddPoint(minLong, maxLat)linearRing.AddPoint(minLong, minLat)polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon)polygon.AddGeometry(linearRing)`

You may have noticed that the coordinate (minLong, minLat)was added to the linear ring twice. This is because we are defining line segments rather than just points—the first call to AddPoint()defines the starting point, and each subsequent call to AddPoint()adds a new line segment to the linear ring. In this case, we start in the lower-left corner and move counter-clockwise around the bounding box until we reach the lower-left corner again:

Once we have the polygon, we can use it to create a feature:

`feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn())feature.SetGeometry(polygon)feature.SetField("COUNTRY", countryName)feature.SetField("CODE", countryCode)dstLayer.CreateFeature(feature)feature.Destroy()`

Notice how we use the setField() method to store the feature's metadata. We also have to call the Destroy() method to close the feature once we have finished with it; this ensures that the feature is saved into the Shapefile.

4. Finally, we call the Destroy() method to close the output Shapefile:
`dstFile.Destroy()`
5. Putting all this together, and combining it with the code from the previous recipe to calculate the bounding boxes for each country in the World Borders Dataset Shapefile, we end up with the following complete program:
`# boundingBoxesToShapefile.pyimport os, os.path, shutilimport osgeo.ogrimport osgeo.osr# Open the source shapefile.srcFile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")srcLayer = srcFile.GetLayer(0)# Open the output shapefile.if os.path.exists("bounding-boxes"):  shutil.rmtree("bounding-boxes")os.mkdir("bounding-boxes")spatialReference = osgeo.osr.SpatialReference()spatialReference.SetWellKnownGeogCS('WGS84')driver = osgeo.ogr.GetDriverByName("ESRI Shapefile")dstPath = os.path.join("bounding-boxes", "boundingBoxes.shp")dstFile = driver.CreateDataSource(dstPath)dstLayer = dstFile.CreateLayer("layer", spatialReference)fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString)fieldDef.SetWidth(50)dstLayer.CreateField(fieldDef)fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString)fieldDef.SetWidth(3)dstLayer.CreateField(fieldDef)# Read the country features from the source shapefile.for i in range(srcLayer.GetFeatureCount()):  feature = srcLayer.GetFeature(i)  countryCode = feature.GetField("ISO3")  countryName = feature.GetField("NAME")  geometry = feature.GetGeometryRef()  minLong,maxLong,minLat,maxLat = geometry.GetEnvelope()  # Save the bounding box as a feature in the output  # shapefile.  linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing)  linearRing.AddPoint(minLong, minLat)  linearRing.AddPoint(maxLong, minLat)  linearRing.AddPoint(maxLong, maxLat)  linearRing.AddPoint(minLong, maxLat)  linearRing.AddPoint(minLong, minLat)  polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon)  polygon.AddGeometry(linearRing)  feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn())  feature.SetGeometry(polygon)  feature.SetField("COUNTRY", countryName)  feature.SetField("CODE", countryCode)  dstLayer.CreateFeature(feature)  feature.Destroy()# All done.srcFile.Destroy()dstFile.Destroy()`

The only unexpected twist in this program is the use of a sub-directory called bounding-boxes to store the output Shapefile. Because a Shapefile is actually made up of multiple files on disk (a .dbf file, a .prj file, a .shp file, and a .shx file), it is easier to place these together in a sub-directory. We use the Python Standard Library module shutil to delete the previous contents of this directory, and then os.mkdir() to create it again.

If you aren't storing the TM_WORLD_BORDERS-0.3.shp Shapefile in the same directory as the script itself, you will need to add the directory where the Shapefile is stored to your osgeo.ogr.Open() call. You can also store the boundingBoxes.shp Shapefile in a different directory if you prefer, by changing the path where this Shapefile is created.

Running this program creates the bounding box Shapefile, which we can then draw onto a map. For example, here is the outline of Thailand along with a bounding box taken from the boundingBoxes.shp Shapefile:

(For more resources on Python, see here.)

## Task: Analyze height data using a digital elevation map

A DEM (Digital Elevation Map) is a type of raster format geo-spatial data where each pixel value represents the height of a point on the Earth's surface. We encountered DEM files in the previous chapter, where we saw two examples of datasources which supply this type of information: the National Elevation Dataset covering the United States, and GLOBE which provides DEM files covering the entire Earth.

Because a DEM file contains height data, it can be interesting to analyze the height values for a given area. For example, we could draw a histogram showing how much of a country's area is at a certain elevation. Let's take some DEM data from the GLOBE dataset, and calculate a height histogram using that data.

To keep things simple, we will choose a small country surrounded by the ocean: New Zealand.

We're using a small country so that we don't have too much data to work with, and we're using a country surrounded by ocean so that we can check all the points within a bounding box rather than having to use a polygon to exclude points outside of the country's boundaries.

To download the DEM data, go to the GLOBE website (http://www.ngdc.noaa.gov/mgg/topo/globe.html) and click on the Get Data Online hyperlink. We're going to use the data already calculated for this area of the world, so click on the Any or all 16 "tiles" hyperlink. New Zealand is in tile L, so click on this tile to download it.

The file you download will be called l10g.gz. If you decompress it, you will end up with a file l10g containing the raw elevation data.

By itself, this file isn't very useful—it needs to be georeferenced onto the Earth's surface so that you can match up a height value with its position on the Earth. To do this, you need to download the associated header file. Unfortunately, the GLOBE website makes this rather difficult; the header files for the premade tiles can be found at:

http://www.ngdc.noaa.gov/mgg/topo/elev/esri/hdr

Download the file named l10g.hdr and place it into the same directory as the l10g file you downloaded earlier. You can then read the DEM file using GDAL:

`import osgeo.gdaldataset = osgeo.gdal.Open("l10g")`

As you no doubt noticed when you downloaded the l10g tile, this covers much more than just New Zealand—all of Australia is included, as well as Malaysia, Papua New Guinea, and several other east-Asian countries. To work with the height data for just New Zealand, we have to be able to identify the relevant portion of the raster DEM— that is, the range of x,y coordinates which cover New Zealand. We start by looking at a map and identifying the minimum and maximum latitude/longitude values which enclose all of New Zealand, but no other country:

Rounded to the nearest whole degree, we get a long/lat bounding box of (165, -48)… (179, -33). This is the area we want to scan to cover all of New Zealand.

There is, however, a problem—the raster data consists of pixels or "cells" identified by (x,y) coordinates, not longitude and latitude values. We have to convert from longitudes and latitudes into x and y coordinates. To do this, we need to make use of the raster DEM's affine transformation.

An affine transformation is a set of six numbers that define how geographic coordinates (latitude and longitude values) are translated into raster (x,y) coordinates. This is done using two formulas:

`longitude = t[0] + x*t[1] + y*t[2]latitude = t[3] + x*t[4] + y*t[5]`

Fortunately, we don't have to deal with these formulas directly as GDAL will do it for us. We start by obtaining our dataset's affine transformation:

`t = dataset.GetGeoTransform()`

Using this transformation, we could convert an x,y coordinate into its associated latitude and longitude value. In this case, however, we want to do the opposite—we want to take a latitude and longitude, and calculate the associated x,y coordinate.

To do this, we have to invert the affine transformation. Once again, GDAL will do this for us:

`success,tInverse = gdal.InvGeoTransform(t)if not success:   print "Failed!"   sys.exit(1)`

There are some cases where an affine transformation can't be inverted. This is why gdal.InvGeoTransform() returns a success flag as well as the inverted transformation. With this DEM data, however, the affine transformation should always be invertible.

Now that we have the inverse affine transformation, it is possible to convert from a latitude and longitude into an x,y coordinate by using:

`x,y = gdal.ApplyGeoTransform(tInverse, longitude, latitude)`

Using this, it's easy to identify the minimum and maximum x,y coordinates that cover the area we are interested in:

`x1,y1 = gdal.ApplyGeoTransform(tInverse, minLong, minLat)x2,y2 = gdal.ApplyGeoTransform(tInverse, maxLong, maxLat)minX = int(min(x1, x2))maxX = int(max(x1, x2))minY = int(min(y1, y2))maxY = int(max(y1, y2))`

Now that we know the x,y coordinates for the portion of the DEM that we're interested in, we can use GDAL to read in the individual height values. We start by obtaining the raster band that contains the DEM data:

`band = dataset.GetRasterBand(1)`

GDAL band numbers start at one. There is only one raster band in the DEM data we're using.

Now that we have the raster band, we can use the band.ReadRaster() method to read the raw DEM data. This is what the ReadRaster() method looks like:

`ReadRaster(x, y, width, height, dWidth, dHeight, pixelType)`

Where:

• x is the number of pixels from the left side of the raster band to the left side of the portion of the band to read from
• y is the number of pixels from the top of the raster band to the top of the portion of the band to read from
• width is the number of pixels across to read
• height is the number of pixels down to read
• dWidth is the width of the resulting data
• dHeight is the height of the resulting data
• pixelType is a constant defining how many bytes of data there are for each pixel value, and how that data is to be interpreted

Normally, you would set dWidth and dHeight to the same value as width and height; if you don't do this, the raster data will be scaled up or down when it is read.

The ReadRaster() method returns a string containing the raster data as a raw sequence of bytes. You can then read the individual values from this string using the struct standard library module:

`values = struct.unpack("<" + ("h" * width), data)`

Putting all this together, we can use GDAL to open the raster datafile and read all the pixel values within the bounding box surrounding New Zealand:

`import sys, structfrom osgeo import gdalfrom osgeo import gdalconstminLat = -48maxLat = -33minLong = 165maxLong = 179dataset = gdal.Open("l10g")band = dataset.GetRasterBand(1)t = dataset.GetGeoTransform()success,tInverse = gdal.InvGeoTransform(t)if not success:  print "Failed!"  sys.exit(1)x1,y1 = gdal.ApplyGeoTransform(tInverse, minLong, minLat)x2,y2 = gdal.ApplyGeoTransform(tInverse, maxLong, maxLat)minX = int(min(x1, x2))maxX = int(max(x1, x2))minY = int(min(y1, y2))maxY = int(max(y1, y2))width = (maxX - minX) + 1fmt = "<" + ("h" * width)for y in range(minY, maxY+1):  scanline = band.ReadRaster(minX, y,width, 1,       width, 1,       gdalconst.GDT_Int16)  values = struct.unpack(fmt, scanline)  for value in values:  ...`

Don't forget to add a directory path to the gdal.Open()statement if you placed the l10g file in a different directory.

Let's replace the ... with some code that does something useful with the pixel values. We will calculate a histogram:

`histogram = {} # Maps height to # pixels with that height....for value in values:  try:    histogram[value] += 1  except KeyError:    histogram[value] = 1for height in sorted(histogram.keys()):   print height,histogram[height]`

If you run this, you will see a list of heights (in meters) and how many pixels there are at that height:

`-500 26075811 66412 9093 1628...3097 13119 23173 1`

This reveals one final problem—there are a large number of pixels with a value of -500. What is going on here? Clearly -500 is not a valid height value. The GLOBE documentation explains:

Every tile contains values of -500 for oceans, with no values between -500 and the minimum value for land noted here.

So, all those points with a value of -500 represents pixels over the ocean. Fortunately, it is easy to exclude these; every raster file includes the concept of a no data value that is used for pixels without valid data. GDAL includes the GetNoDataValue() method that allows us to exclude these pixels:

`for value in values:  if value != band.GetNoDataValue():    try:      histogram[value] += 1    except KeyError:      histogram[value] = 1`

This finally gives us a histogram of the heights across New Zealand. You could create a graph using this data if you wished. For example, the following chart shows the total number of pixels at or below a given height:

# Changing datums and projections

A datum is a mathematical model of the Earth's shape, while a projection is a way of translating points on the Earth's surface into points on a two-dimensional map. There are a large number of available datums and projections—whenever you are working with geo-spatial data, you must know which datum and which projection (if any) your data uses. If you are combining data from multiple sources, you will often have to change your geo-spatial data from one datum to another, or from one projection to another.

## Task: Change projections to combine Shapefiles using geographic and UTM coordinates

Here, we will work with two Shapefiles that have different projections. We haven't yet encountered any geo-spatial data that uses a projection—all the data we've seen so far uses geographic (unprojected) latitude and longitude values. So, let's start by downloading some geo-spatial data in UTM (Universal Transverse Mercator) projection.

The WebGIS website (http://webgis.com) provides Shapefiles describing land-use and land-cover, called LULC datafiles. For this example, we will download a Shapefile for southern Florida (Dade County, to be exact) which uses the Universal Transverse Mercator projection.

http://webgis.com/MAPS/fl/lulcutm/miami.zip

The uncompressed directory contains the Shapefile, called miami.shp, along with a datum_reference.txt file describing the Shapefile's coordinate system. This file tells us the following:

`The LULC shape file was generated from the original USGS GIRAS LULCfile by Lakes Environmental Software.Datum: NAD83Projection: UTMZone: 17Data collection date by U.S.G.S.: 1972Reference:http://edcwww.cr.usgs.gov/products/landcover/lulc.html`

So, this particular Shapefile uses UTM Zone 17 projection, and a datum of NAD83.

Let's take a second Shapefile, this time in geographic coordinates. We'll use the GSHHS shoreline database, which uses the WGS84 datum and geographic (latitude/longitude) coordinates.

You don't need to download the GSHHS database for this example; while we will display a map overlaying the LULC data over the top of the GSHHS data, you only need the LULC Shapefile to complete this recipe.

Combining these two Shapefiles as they are would be impossible—the LULC Shapefile has coordinates measured in UTM (that is, in meters from a given reference line), while the GSHHS Shapefile has coordinates in latitude and longitude values (in decimal degrees):

`LULC:   x=485719.47, y=2783420.62              x=485779.49,y=2783380.63              x=486129.65, y=2783010.66              ...GSHHS: x=180.0000,y=68.9938               x=180.0000,y=65.0338               x=179.9984, y=65.0337               ...`

Before we can combine these two Shapefiles, we first have to convert them to use the same projection. We'll do this by converting the LULC Shapefile from UTM-17 to geographic projection. Doing this requires us to define a coordinate transformation and then apply that transformation to each of the features in the Shapefile.

Here is how you can define a coordinate transformation using OGR:

`from osgeo import osrsrcProjection = osr.SpatialReference()srcProjection.SetUTM(17)dstProjection = osr.SpatialReference()dstProjection.SetWellKnownGeogCS('WGS84') # Lat/long.transform = osr.CoordinateTransformation(srcProjection,                          dstProjection)`

Using this transformation, we can transform each of the features in the Shapefile from UTM projection back into geographic coordinates:

`for i in range(layer.GetFeatureCount()):   feature = layer.GetFeature(i)   geometry = feature.GetGeometryRef()   geometry.Transform(transform)   ...`

Putting all this together with the techniques we explored earlier for copying the features from one Shapefile to another, we end up with the following complete program:

`# changeProjection.pyimport os, os.path, shutilfrom osgeo import ogrfrom osgeo import osrfrom osgeo import gdal# Define the source and destination projections, and a# transformation object to convert from one to the other.srcProjection = osr.SpatialReference()srcProjection.SetUTM(17)dstProjection = osr.SpatialReference()dstProjection.SetWellKnownGeogCS('WGS84') # Lat/long.transform = osr.CoordinateTransformation(srcProjection,dstProjection)# Open the source shapefile.srcFile = ogr.Open("miami/miami.shp")srcLayer = srcFile.GetLayer(0)# Create the dest shapefile, and give it the new projection.if os.path.exists("miami-reprojected"):  shutil.rmtree("miami-reprojected")os.mkdir("miami-reprojected")driver = ogr.GetDriverByName("ESRI Shapefile")dstPath = os.path.join("miami-reprojected", "miami.shp")dstFile = driver.CreateDataSource(dstPath)dstLayer = dstFile.CreateLayer("layer", dstProjection)# Reproject each feature in turn.for i in range(srcLayer.GetFeatureCount()):  feature = srcLayer.GetFeature(i)  geometry = feature.GetGeometryRef()  newGeometry = geometry.Clone()  newGeometry.Transform(transform)  feature = ogr.Feature(dstLayer.GetLayerDefn())  feature.SetGeometry(newGeometry)  dstLayer.CreateFeature(feature)  feature.Destroy()# All done.srcFile.Destroy()dstFile.Destroy()`

Note that this example doesn't copy field values into the new Shapefile; if your Shapefile has metadata, you will want to copy the fields across as you create each new feature. Also, the above code assumes that the miami.shp Shapefile has been placed into a miami sub-directory; you'll need to change the ogr.Open() statement to use the appropriate path name if you've stored this Shapefile in a different place.

After running this program over the miami.shp Shapefile, the coordinates for all the features in the Shapefile will have been converted from UTM-17 into geographic coordinates:

`Before reprojection: x=485719.47, y=2783420.62                               x=485779.49, y=2783380.63                               x=486129.65, y=2783010.66                               ...After reprojection:   x=-81.1417, y=25.1668                               x=-81.1411, y=25.1664                               x=-81.1376, y=25.1631                               ...`

To see that this worked, let's draw a map showing the reprojected LULC data on top of the GSHHS shoreline database:

Both Shapefiles now use geographic coordinates, and as you can see the coastlines match exactly.

If you have been watching closely, you may have noticed that the LULC data is using the NAD83 datum, while the GSHHS data and our reprojected version of the LULC data both use the WGS84 datum. We can do this without error because the two datums are identical for points within North America.

(For more resources on Python, see here.)

## Task: Change datums to allow older and newer TIGER data to be combined

For this example, we will need to obtain some geo-spatial data that uses the NAD27 datum. This datum dates back to 1927, and was commonly used for North American geo-spatial analysis up until the 1980s when it was replaced by NAD83.

ESRI makes available a set of TIGER/Line files from the 2000 US census, converted into Shapefile format. These files can be downloaded from:

This data is divided up into individual counties. Click on the checkbox beside Anchorage, then click on the Proceed to Download button to download the Shapefile containing road details in Anchorage. The resulting Shapefile will be named tgr02020lkA.shp, and will be in a directory called lkA02020.

As described on the website, this data uses the NAD27 datum. If we were to assume this Shapefile used the WSG83 datum, all the features would be in the wrong place:

The heavy lines indicate where the features would appear if they were plotted using the incorrect WGS84 datum, while the thin dashed lines show where the features should really appear.

To make the features appear in the correct place, and to be able to combine these features with other features that use the WGS84 datum, we need to convert the Shapefile to use WGS84. Changing a Shapefile from one datum to another requires the same basic process we used earlier to change a Shapefile from one projection to another: first, you choose the source and destination datums, and define a coordinate transformation to convert from one to the other:

`srcDatum = osr.SpatialReference()srcDatum.SetWellKnownGeogCS('NAD27')dstDatum = osr.SpatialReference()dstDatum.SetWellKnownGeogCS('WGS84')transform = osr.CoordinateTransformation(srcDatum, dstDatum)`

You then process each feature in the Shapefile, transforming the feature's geometry using the coordinate transformation:

`for i in range(srcLayer.GetFeatureCount()):   feature = srcLayer.GetFeature(i)   geometry = feature.GetGeometryRef()   geometry.Transform(transform)   ...`

Here is the complete Python program to convert the lkA02020 Shapefile from the NAD27 datum to WGS84:

`# changeDatum.pyimport os, os.path, shutilfrom osgeo import ogrfrom osgeo import osrfrom osgeo import gdal# Define the source and destination datums, and a# transformation object to convert from one to the other.srcDatum = osr.SpatialReference()srcDatum.SetWellKnownGeogCS('NAD27')dstDatum = osr.SpatialReference()dstDatum.SetWellKnownGeogCS('WGS84')transform = osr.CoordinateTransformation(srcDatum, dstDatum)# Open the source shapefile.srcFile = ogr.Open("lkA02020/tgr02020lkA.shp")srcLayer = srcFile.GetLayer(0)# Create the dest shapefile, and give it the new projection.if os.path.exists("lkA-reprojected"):  shutil.rmtree("lkA-reprojected")os.mkdir("lkA-reprojected")driver = ogr.GetDriverByName("ESRI Shapefile")dstPath = os.path.join("lkA-reprojected", "lkA02020.shp")dstFile = driver.CreateDataSource(dstPath)dstLayer = dstFile.CreateLayer("layer", dstDatum)# Reproject each feature in turn.for i in range(srcLayer.GetFeatureCount()):  feature = srcLayer.GetFeature(i)  geometry = feature.GetGeometryRef()  newGeometry = geometry.Clone()  newGeometry.Transform(transform)  feature = ogr.Feature(dstLayer.GetLayerDefn())  feature.SetGeometry(newGeometry)  dstLayer.CreateFeature(feature)  feature.Destroy()# All done.srcFile.Destroy()dstFile.Destroy()`

The above code assumes that the lkA02020 folder is in the same directory as the Python script itself. If you've placed this folder somewhere else, you'll need to change the ogr.Open() statement to use the appropriate directory path.

If we now plot the reprojected features using the WGS84 datum, the features will appear in the correct place:

The thin dashed lines indicate where the original projection would have placed the features, while the heavy lines show the correct positions using the reprojected data.

# Representing and storing geo-spatial data

While geo-spatial data is often supplied in the form of vector-format files such as Shapefiles, there are situations where Shapefiles are unsuitable or inefficient. One such situation is where you need to take geo-spatial data from one library and use it in a different library. For example, imagine that you have read a set of geometries out of a Shapefile and want to store them in a database, or work with them using the Shapely library. Because the different Python libraries all use their own private classes to represent geo-spatial data, you can't just take an OGR Geometry object and pass it to Shapely, or use a GDAL SpatialReference object to define the datum and projection to use for data stored in a database.

In these situations, you need to have an independent format for representing and storing geo-spatial data that isn't limited to just one particular Python library. This format, the lingua franca for vector-format geo-spatial data, is called Well-Known Text or WKT.

WKT is a compact text-based description of a geo-spatial object such as a point, a line, or a polygon. For example, here is a geometry defining the boundary of the Vatican City in the World Borders Dataset, converted into a WKT string:

`POLYGON ((12.445090330888604 41.90311752178485,12.451653339580503 41.907989033391232,12.456660170953796 41.901426024699163,12.445090330888604 41.90311752178485))`

As you can see, the WKT string contains a straightforward text description of a geometry—in this case, a polygon consisting of four x,y coordinates. Obviously, WKT text strings can be far more complex than this, containing many thousands of points and storing multipolygons and collections of different geometries. No matter how complex the geometry is, it can still be represented as a simple text string.

There is an equivalent binary format called Well-Known Binary (WKB) that stores the same information as binary data. WKB is often used to store geo-spatial data into a database.

WKT strings can also be used to represent a spatial reference encompassing a projection, a datum, and/or a coordinate system. For example, here is an osgeo.osr.SpatialReference object representing a geographic coordinate system using the WGS84 datum, converted into a WKT string:

`GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9108"]],AUTHORITY["EPSG","4326"]]`

As with geometry representations, spatial references in WKT format can be used to pass a spatial reference from one Python library to another.

## Task: Calculate the border between Thailand and Myanmar

In this recipe, we will make use of the World Borders Dataset to obtain polygons defining the borders of Thailand and Myanmar. We will then transfer these polygons into Shapely, and use Shapely's capabilities to calculate the common border between these two countries.

The World Borders Dataset conveniently includes ISO 3166 two-character country codes for each feature, so we can identify the features corresponding to Thailand and Myanmar as we read through the Shapefile:

`import osgeo.ogrshapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")layer = shapefile.GetLayer(0)for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  if feature.GetField("ISO2") == "TH":     ...  elif feature.GetField("ISO2") == "MM":     ...`

This code assumes that you have placed the TM_WORLD_BORDERS-0.3.shp Shapefile in the same directory as the Python script. If you've placed it into a different directory, you'll need to adjust the osgeo.ogr.Open() statement to match.

Once we have identified the features we want, it is easy to extract the features' geometries as WKT strings:

`geometry = feature.GetGeometryRef()wkt = geometry.ExportToWkt()`

We can then convert these to Shapely geometry objects using the shapely.wkt module:

`import shapely.wkt...border = shapely.wkt.loads(wkt)`

Now that we have the objects in Shapely, we can use Shapely's computational geometry capabilities to calculate the common border between these two countries:

`commonBorder = thailandBorder.intersection(myanmarBorder)`

The result will be a LineString (or a MultiLineString if the border is broken up into more than one part). If we wanted to, we could then convert this Shapely object back into an OGR geometry, and save it into a Shapefile again:

`wkt = shapely.wkt.dumps(commonBorder)feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn())feature.SetGeometry(osgeo.ogr.CreateGeometryFromWkt(wkt))dstLayer.CreateFeature(feature)feature.Destroy()`

With the common border saved into a Shapefile, we can display the results as a map:

The contents of the common-border/border.shp Shapefile is represented by the heavy line along the countries' common borders.

Here is the entire program used to calculate this common border:

`# calcCommonBorders.pyimport os,os.path,shutilimport osgeo.ogrimport shapely.wkt# Load the thai and myanmar polygons from the world borders# dataset.shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")layer = shapefile.GetLayer(0)thailand = Nonemyanmar = Nonefor i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  if feature.GetField("ISO2") == "TH":    geometry = feature.GetGeometryRef()    thailand = shapely.wkt.loads(geometry.ExportToWkt()) elif feature.GetField("ISO2") == "MM":    geometry = feature.GetGeometryRef()    myanmar = shapely.wkt.loads(geometry.ExportToWkt())# Calculate the common border.commonBorder = thailand.intersection(myanmar)# Save the common border into a new shapefile.if os.path.exists("common-border"):  shutil.rmtree("common-border")os.mkdir("common-border")spatialReference = osgeo.osr.SpatialReference()spatialReference.SetWellKnownGeogCS('WGS84')driver = osgeo.ogr.GetDriverByName("ESRI Shapefile")dstPath = os.path.join("common-border", "border.shp")dstFile = driver.CreateDataSource(dstPath)dstLayer = dstFile.CreateLayer("layer", spatialReference)wkt = shapely.wkt.dumps(commonBorder)feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn())feature.SetGeometry(osgeo.ogr.CreateGeometryFromWkt(wkt))dstLayer.CreateFeature(feature)feature.Destroy()dstFile.Destroy()`

If you've placed your TM_WORLD_BORDERS-0.3.shp Shapefile into a different directory, change the osgeo.ogr.Open() statement to include a suitable directory path.

We will use this Shapefile later in this article to calculate the length of the Thai-Myanmar border, so make sure you generate and keep a copy of the commonborders/border.shp Shapefile.

## Task: Save geometries into a text file

WKT is not only useful for transferring geometries from one Python library to another. It can also be a useful way of storing geo-spatial data without having to deal with the complexity and constraints imposed by using Shapefiles.

In this example, we will read a set of polygons from the World Borders Dataset, convert them to WKT format, and save them as text files:

`# saveAsText.pyimport os,os.path,shutilimport osgeo.ogrif os.path.exists("country-wkt-files"):  shutil.rmtree("country-wkt-files")os.mkdir("country-wkt-files")shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")layer = shapefile.GetLayer(0)for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  name = feature.GetField("NAME")  geometry = feature.GetGeometryRef()  f = file(os.path.join("country-wkt-files",         name + ".txt"), "w")  f.write(geometry.ExportToWkt())  f.close()`

As usual, you'll need to change the osgeo.ogr.Open() statement to include a directory path if you've stored the Shapefile in a different directory.

You might be wondering why you want to do this, rather than creating a Shapefile to store your geo-spatial data. Well, Shapefiles are limited in that all the features in a single Shapefile must have the same geometry type. Also, the complexity of setting up metadata and saving geometries can be overkill for some applications. Sometimes, dealing with plain text is just easier.

# Summary

In this article, we have looked at various techniques for using OGR and GDAL within Python programs to solve real-world problems. We have learned:

• How to read from and write to vector-format geo-spatial data in Shapefiles.
• How to read and analyze raster-format geo-spatial data.
• How to change the datum and projection used by a Shapefile.
• That the Well-Known Text (WKT) format can be used to represent geo-spatial features and spatial references in plain text.
• That WKT can be used to transfer geo-spatial data from one Python library to another.
• That WKT can be used to store geo-spatial data in plain text format.

In the next article, Geo-Spatial Data in Python: Working with Geometry, we will work with shapely geometries in Python.

Further resources on this subject:

You've been reading an excerpt of: