# Geo-Spatial Data in Python: Working with Geometry

Your details (so we can tell your friend who this is from) *
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Z
S
8
k
r
A
by Erik Westra | December 2010 | Open Source

In the previous article, Working with Geo-Spatial Data in Python, we took a look at the various techniques for using OGR and GDAL within Python programs to solve real-world problems.

In this article by Erik Westra, author of Python Geospatial Development, we will cover the following:

• Using Shapely to work with points, lines, and polygons
• Converting and standardizing units of geometry and distance

## Python Geospatial Development

 Build a complete and sophisticated mapping application from scratch using Python tools for GIS development Build applications for GIS development using Python Analyze and visualize Geo-Spatial data Comprehensive coverage of key GIS concepts Recommended best practices for storing spatial data in a database Draw maps, place data points onto a map, and interact with maps A practical tutorial with plenty of step-by-step instructions to help you develop a mapping application from scratch

(For more resources on Python, see here.)

# Working with Shapely geometries

Shapely is a very capable library for performing various calculations on geo-spatial data. Let's put it through its paces with a complex, real-world problem.

## Task: Identify parks in or near urban areas

The U.S. Census Bureau makes available a Shapefile containing something called Core Based Statistical Areas (CBSAs), which are polygons defining urban areas with a population of 10,000 or more. At the same time, the GNIS website provides lists of placenames and other details. Using these two datasources, we will identify any parks within or close to an urban area.

Because of the volume of data we are potentially dealing with, we will limit our search to California. Feel free to download the larger data sets if you want, though you will have to optimize the code or your program will take a very long time to check all the CBSA polygon/placename combinations.

http://census.gov/geo/www/tiger
3. Choose California from the pop-up menu on the right, and click on Submit. A list of the California Shapefiles will be displayed; the Shapefile you want is labelled Metropolitan/Micropolitan Statistical Area. Click on this link, and you will download a file named tl_2009_06_cbsa.zip. Once the file has downloaded, uncompress it and place the resulting Shapefile into a convenient location so that you can work with it.
4. You now need to download the GNIS placename data for California. Go to the GNIS website:
http://geonames.usgs.gov/domestic
5. Click on the Download Domestic Names hyperlink, and then choose California from the pop-up menu. You will be prompted to save the CA_Features_XXX.zip file. Do so, then decompress it and place the resulting CA_Features_XXX.txt file into a convenient place.

The XXX in the above file name is a date stamp, and will vary depending on when you download the data. Just remember the name of the file as you'll need to refer to it in your source code.

6. We're now ready to write the code. Let's start by reading through the CBSA urban area Shapefile and extracting the polygons that define the boundary of each urban area:
`shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")layer = shapefile.GetLayer(0)for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  geometry = feature.GetGeometryRef()  ...`

Make sure you add directory paths to your osgeo.ogr.Open() statement (and to the file() statement below) to match where you've placed these files.

7. Using what we learned in the previous section, we can convert this geometry into a Shapely object so that we can work with it:
`wkt = geometry.ExportToWkt()shape = shapely.wkt.loads(wkt)`
8. Next, we need to scan through the CA_Features_XXX.txt file to identify the features marked as a park. For each of these features, we want to extract the name of the feature and its associated latitude and longitude. Here's how we might do this:
`f = file("CA_Features_XXX.txt", "r")for line in f.readlines():  chunks = line.rstrip().split("|")  if chunks[2] == "Park":    name = chunks[1]    latitude = float(chunks[9])    longitude = float(chunks[10])    ...`

Remember that the GNIS placename database is a pipedelimited text file. That's why we have to split the line up using line.rstrip().split("|").

9. Now comes the fun part—we need to figure out which parks are within or close to each urban area. There are two ways we could do this, either of which will work:
• We could use the shape.distance() method to calculate the distance between the shape and a Point object representing the park's location:

• We could dilate the polygon using the shape.buffer() method, and then see if the resulting polygon contained the desired point:

The second option is faster when dealing with a large number of points as we can pre-calculate the dilated polygons and then use them to compare against each point in turn. Let's take this option:

`# findNearbyParks.pyimport osgeo.ogrimport shapely.geometryimport shapely.wktMAX_DISTANCE = 0.1 # Angular distance; approx 10 km.print "Loading urban areas..."urbanAreas = {} # Maps area name to Shapely polygon.shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")layer = shapefile.GetLayer(0)for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)  name = feature.GetField("NAME")  geometry = feature.GetGeometryRef()  shape = shapely.wkt.loads(geometry.ExportToWkt())  dilatedShape = shape.buffer(MAX_DISTANCE)  urbanAreas[name] = dilatedShapeprint "Checking parks..."f = file("CA_Features_XXX.txt", "r")for line in f.readlines():  chunks = line.rstrip().split("|")  if chunks[2] == "Park":   parkName = chunks[1]   latitude = float(chunks[9])   longitude = float(chunks[10])   pt = shapely.geometry.Point(longitude, latitude)   for urbanName,urbanArea in urbanAreas.items():     if urbanArea.contains(pt):     print parkName + " is in or near " + urbanNamef.close()`

Don't forget to change the name of the CA_Features_XXX.txt file to match the actual name of the file you downloaded. You may also need to change the path names to the tl_2009_06_CBSA.shp file and the CA_Features file if you placed them in a different directory.

If you run this program, you will get a master list of all the parks that are in or close to an urban area:

`% python findNearbyParks.pyLoading urban areas...Checking parks...Imperial National Wildlife Refuge is in or near El Centro, CATwinLakesStateBeach is in or near Santa Cruz-Watsonville, CAAdmiralWilliamStandleyState Recreation Area is in or near Ukiah, CAAgate Beach County Park is in or near San Francisco-Oakland-Fremont, CA...`

Note that our program uses angular distances to decide if a park is in or near a given urban area. An angular distance is the angle (in decimal degrees) between two rays going out from the center of the Earth to the Earth's surface. Because a degree of angular measurement (at least for the latitudes we are dealing with here) roughly equals 100 km on the Earth's surface, an angular measurement of 0.1 roughly equals a real distance of 10 km.

Using angular measurements makes the distance calculation easy and quick to calculate, though it doesn't give an exact distance on the Earth's surface. If your application requires exact distances, you could start by using an angular distance to filter out the features obviously too far away, and then obtain an exact result for the remaining features by calculating the point on the polygon's boundary that is closest to the desired point, and then calculating the linear distance between the two points. You would then discard the points that exceed your desired exact linear distance.

# Converting and standardizing units of geometry and distance

Imagine that you have two points on the Earth's surface with a straight line drawn between them:

Each point can be described as a coordinate using some arbitrary coordinate system (for example, using latitude and longitude values), while the length of the straight line could be described as the distance between the two points.

Given any two coordinates, it is possible to calculate the distance between them. Conversely, you can start with one coordinate, a desired distance and a direction, and then calculate the coordinates for the other point.

Of course, because the Earth's surface is not flat, we aren't really dealing with straight lines at all. Rather, we are calculating geodetic or Great Circle distances across the surface of the Earth.

The pyproj Python library allows you to perform these types of calculations for any given datum. You can also use pyproj to convert from projected coordinates back to geographic coordinates, and vice versa, allowing you to perform these sorts of calculations for any desired datum, coordinate system, and projection.

Ultimately, a geometry such as a line or a polygon consists of nothing more than a list of connected points. This means that, using the above process, you can calculate the geodetic distance between each of the points in any polygon and total the results to get the actual length for any geometry. Let's use this knowledge to solve a realworld problem.

## Python Geospatial Development

 Build a complete and sophisticated mapping application from scratch using Python tools for GIS development
Published: December 2010
eBook Price: €23.99
Book Price: €38.99
See more

(For more resources on Python, see here.)

## Task: Calculate the length of the Thai-Myanmar border

To solve this problem, we will make use of the common-borders/border.shp Shapefile we created earlier. This Shapefile contains a single feature, which is a LineString defining the border between the two countries. Let's start by taking a look at the individual line segments that make up this feature's geometry:

`import os.pathimport osgeo.ogrdef getLineSegmentsFromGeometry(geometry):  segments = []  if geometry.GetPointCount() > 0:    segment = []    for i in range(geometry.GetPointCount()):     segment.append(geometry.GetPoint_2D(i))    segments.append(segment)  for i in range(geometry.GetGeometryCount()):   subGeometry = geometry.GetGeometryRef(i)   segments.extend(getLineSegmentsFromGeometry(subGeometry))   return segmentsfilename = os.path.join("common-border", "border.shp")shapefile = osgeo.ogr.Open(filename)layer = shapefile.GetLayer(0)feature = layer.GetFeature(0)geometry = feature.GetGeometryRef()segments = getLineSegmentsFromGeometry(geometry)print segments`

Don't forget to change the os.path.join() statement to match the location of your border.shp Shapefile.

Note that we use a recursive function, getLineSegmentsFromGeometry(), to pull the individual coordinates for each line segment out of the geometry. Because geometries are recursive data structures, we have to pull out the individual line segments before we can work with them.

Running this program produces a long list of points that make up the various line segments defining the border between these two countries:

`% python calcBorderLength.py[[(100.08132200000006, 20.348840999999936),(100.08943199999999, 20.347217999999941)],[(100.08943199999999, 20.347217999999941),(100.0913700000001, 20.348606000000075)], ...]`

Each line segment consists of a list of points—in this case, you'll notice that each segment has only two points—and if you look closely you will notice that each segment starts at the same point as the previous segment ended. There are a total of 459 segments defining the border between Thailand and Myanmar—that is, 459 point pairs that we can calculate the geodetic distance for.

A geodetic distance is a distance measured on the surface of the Earth.

Let's see how we can use pyproj to calculate the geodetic distance between any two points. We first create a Geod instance:

`geod = pyproj.Geod(ellps='WGS84')`

Geod is the pyproj class that performs geodetic calculations. Note that we have to provide it with details of the datum used to describe the shape of the Earth. Once our Geod instance has been set up, we can calculate the geodetic distance between any two points by calling geod.inv(), the inverse geodetic transformation method:

`angle1,angle2,distance = geod.inv(long1, lat1, long2, lat2)`

angle1 will be the angle from the first point to the second, measured in decimal degrees; angle2 will be the angle from the second point back to the first (again in degrees); and distance will be the Great Circle distance between the two points, in meters.

Using this, we can iterate over the line segments, calculate the distance from one point to another, and total up all the distances to obtain the total length of the border:

`geod = pyproj.Geod(ellps='WGS84')totLength = 0.0for segment in segments:  for i in range(len(segment)-1):    pt1 = segment[i]    pt2 = segment[i+1]    long1,lat1 = pt1    long2,lat2 = pt2    angle1,angle2,distance = geod.inv(long1, lat1,    long2, lat2)    totLength += distance`

Upon completion, totLength will be the total length of the border, in meters.

Putting all this together, we end up with a complete Python program to read the border.shp Shapefile, and calculate and then display the total length of the common border:

`# calcBorderLength.pyimport os.pathimport osgeo.ogrimport pyprojdef getLineSegmentsFromGeometry(geometry):  segments = []  if geometry.GetPointCount() > 0:   segment = []   for i in range(geometry.GetPointCount()):     segment.append(geometry.GetPoint_2D(i))    segments.append(segment)  for i in range(geometry.GetGeometryCount()):    subGeometry = geometry.GetGeometryRef(i)    segments.extend(      getLineSegmentsFromGeometry(subGeometry))  return segmentsfilename = os.path.join("common-border", "border.shp")shapefile = osgeo.ogr.Open(filename)layer = shapefile.GetLayer(0)feature = layer.GetFeature(0)geometry = feature.GetGeometryRef()segments = getLineSegmentsFromGeometry(geometry)geod = pyproj.Geod(ellps='WGS84')totLength = 0.0for segment in segments:  for i in range(len(segment)-1):    pt1 = segment[i]    pt2 = segment[i+1]    long1,lat1 = pt1    long2,lat2 = pt2    angle1,angle2,distance = geod.inv(long1, lat1,                   long2, lat2)    totLength += distanceprint "Total border length = %0.2f km" % (totLength/1000)`

Running this tells us the total calculated length of the Thai-Myanmar border:

`% python calcBorderLength.pyTotal border length = 1730.55 km`

In this program, we have assumed that the Shapefile is in geographic coordinates using the WGS84 ellipsoid, and only contains a single feature. Let's extend our program to deal with any supplied projection and datum, and at the same time process all the features in the Shapefile rather than just the first. This will make our program more flexible, and allow it to work with any arbitrary Shapefile rather than just the common-border Shapefile we created earlier.

Let's deal with the projection and datum first. We could change the projection and datum for our Shapefile before we process it, just as we did with the LULC and lkA02020 Shapefiles. That would work, but it would require us to create a temporary Shapefile just to calculate the length, which isn't very efficient. Instead, let's make use of pyproj directly to reproject the Shapefile's contents back into geographic coordinates if necessary. We can do this by querying the Shapefile's spatial reference:

`shapefile = ogr.Open(filename)layer = shapefile.GetLayer(0)spatialRef = layer.GetSpatialRef()if spatialRef == None:  print "Shapefile has no spatial reference, using WGS84."  spatialRef = osr.SpatialReference()  spatialRef.SetWellKnownGeogCS('WGS84')`

Once we have the spatial reference, we can see if the spatial reference is projected, and if so use pyproj to turn the projected coordinates back into lat/long values again, like this:

`if spatialRef.IsProjected():  # Convert projected coordinates back to lat/long values.srcProj = pyproj.Proj(spatialRef.ExportToProj4())dstProj = pyproj.Proj(proj='longlat', ellps='WGS84',  datum='WGS84')...long,lat = pyproj.transform(srcProj, dstProj, x, y)`

Using this, we can rewrite our program to accept data using any projection and datum. At the same time, we'll change it to calculate the overall length of every feature in the file, rather than just the first, and also to accept the name of the Shapefile from the command line. Finally, we'll add some error-checking. Let's call the results calcFeatureLengths.py.

We'll start by copying the getLineSegmentsFromGeometry() function we used earlier:

`import sysfrom osgeo import ogr, osrimport pyprojdef getLineSegmentsFromGeometry(geometry):  segments = []  if geometry.GetPointCount() > 0:    segment = []    for i in range(geometry.GetPointCount()):      segment.append(geometry.GetPoint_2D(i))    segments.append(segment)  for i in range(geometry.GetGeometryCount()):    subGeometry = geometry.GetGeometryRef(i)    segments.extend(      getLineSegmentsFromGeometry(subGeometry))  return segments`

Next, we'll get the name of the Shapefile to open from the command line:

`if len(sys.argv) != 2:  print "Usage: calcFeatureLengths.py <shapefile>"  sys.exit(1)filename = sys.argv[1]`

We'll then open the Shapefile and obtain its spatial reference, using the code we wrote earlier:

`shapefile = ogr.Open(filename)layer = shapefile.GetLayer(0)spatialRef = layer.GetSpatialRef()if spatialRef == None:print "Shapefile lacks a spatial reference, using WGS84."spatialRef = osr.SpatialReference()spatialRef.SetWellKnownGeogCS('WGS84')`

We'll then get the source and destination projections, again using the code we wrote earlier. Note that we only need to do this if we're using projected coordinates:

`if spatialRef.IsProjected():  srcProj = pyproj.Proj(spatialRef.ExportToProj4())  dstProj = pyproj.Proj(proj='longlat', ellps='WGS84',     datum='WGS84')`

We are now ready to start processing the Shapefile's features:

`for i in range(layer.GetFeatureCount()):  feature = layer.GetFeature(i)`

Now that we have the feature, we can borrow the code we used earlier to calculate the total length of that feature's line segments:

`geometry = feature.GetGeometryRef()segments = getLineSegmentsFromGeometry(geometry)geod = pyproj.Geod(ellps='WGS84')totLength = 0.0for segment in segments:  for j in range(len(segment)-1):    pt1 = segment[j]    pt2 = segment[j+1]    long1,lat1 = pt1    long2,lat2 = pt2`

The only difference is that we need to transform the coordinates back to WGS84 if we are using a projected coordinate system:

`if spatialRef.IsProjected():  long1,lat1 = pyproj.transform(srcProj,     dstProj,     long1, lat1)  long2,lat2 = pyproj.transform(srcProj,     dstProj,     long2, lat2)`

We can then use pyproj to calculate the distance between the two points, as we did in our earlier example. This time, though, we'll wrap it in a try...except statement so that any failure to calculate the distance won't crash the program:

`try:  angle1,angle2,distance = geod.inv(long1, lat1,except ValueError:  print "Unable to calculate distance from " \    + "%0.4f,%0.4f to %0.4f,%0.4f" \    % (long1, lat1, long2, lat2)  distance = 0.0totLength += distance`

The geod.inv() call can raise a ValueError if the two coordinates are in a place where an angle can't be calculated—for example, if the two points are at the poles.

And finally, we can print out the feature's total length, in kilometers:

`print "Total length of feature %d is %0.2f km" \   % (i, totLength/1000)`

This program can be run over any Shapefile. For example, you could use it to calculate the border length for every country in the world by running it over the World Borders Dataset:

`% python calcFeatureLengths.py TM_WORLD_BORDERS-0.3.shpTotal length of feature 0 is 127.28 kmTotal length of feature 1 is 7264.69 kmTotal length of feature 2 is 2514.76 kmTotal length of feature 3 is 968.86 kmTotal length of feature 4 is 1158.92 kmTotal length of feature 5 is 6549.53 kmTotal length of feature 6 is 119.27 km...`

This program is an example of converting geometry coordinates into distances. Let's take a look at the inverse calculation: using distances to calculate new geometry coordinates.

## Task: Find a point 132.7 kilometers west of Soshone, California

Using the CA_Features_XXX.txt file, it is possible to find the latitude and longitude of Shoshone, a small town in California east of Las Vegas:

`f = file("CA_Features_20100607.txt", "r")for line in f.readlines():  chunks = line.rstrip().split("|")  if chunks[1] == "Shoshone" and \    chunks[2] == "Populated Place":     latitude = float(chunks[9])     longitude = float(chunks[10])     ...`

Given this coordinate, we can use pyproj to calculate the coordinate of a point a given distance away, at a given angle:

`geod = pyproj.Geod(ellps="WGS84")newLong,newLat,invAngle = geod.fwd(latitude, longitude,      angle, distance)`

For this task, we are given the desired distance and we know that the angle we want is "due west". pyproj uses azimuth angles, which are measured clockwise from North. Thus, due west would correspond to an angle of 270 degrees.

Putting all this together, we can calculate the coordinates of the desired point:

`# findShoshone.pyimport pyprojdistance = 132.7 * 1000angle = 270.0f = file("CA_Features_20100607.txt", "r")for line in f.readlines(): chunks = line.rstrip().split("|") if chunks[1] == "Shoshone" and \  chunks[2] == "Populated Place":    latitude = float(chunks[9])    longitude = float(chunks[10])    geod = pyproj.Geod(ellps='WGS84')    newLong,newLat,invAngle = geod.fwd(longitude,        latitude,angle, distance)    print "Shoshone is at %0.4f,%0.4f" % (latitude,           longitude)    print "The point %0.2f km west of Shoshone " \      % (distance/1000.0) \      + "is at %0.4f, %0.4f" % (newLat, newLong)f.close()`

Running this program gives us the answer we want:

`% python findShoshone.pyShoshone is at 35.9730,-116.2711The point 132.70 km west of Shoshone is at 35.9640,-117.7423`

# Summary

• That you can use the Shapely library to perform various geo-spatial calculations on geometries, including distance calculations, dilation, and intersections.
• That you can use the pyproj.Proj class to convert coordinates from one projection and datum to another.
• That you can use the pyproj.Geod class to convert from geometry coordinates to distances, and vice versa.

Further resources on this subject:

## Python Geospatial Development

 Build a complete and sophisticated mapping application from scratch using Python tools for GIS development
Published: December 2010
eBook Price: €23.99
Book Price: €38.99
See more

## Erik Westra

Erik Westra has been a professional software developer for over 25 years, and has worked almost exclusively in Python for the past decade. Erik's early interest in graphical user-interface design led to the development of one of the most advanced urgent courier dispatch systems used by messenger and courier companies worldwide. In recent years, Erik has been involved in the design and implementation of systems matching seekers and providers of goods and services across a range of geographical areas. This work has included the creation of real-time geocoders and map-based views of constantly changing data. Erik is based in New Zealand, and works for companies worldwide