Learning Geospatial Analysis with Python — Save 50%
Master GIS and Remote Sensing analysis using Python with these easy to follow tutorials with this book and ebook
This article by Joel Lawhead, author of Learning Geospatial Analysis with Python, describes how to classify images and extract features from images with the help of pure Python.
(For more resources related to this topic, see here.)
Automated Remote Sensing ( ARS ) is rarely ever done in the visible spectrum. The most commonly available wavelengths outside of the visible spectrum are infrared and near-infrared. The following scene is a thermal image (band 10) from a fairly recent Landsat 8 flyover of the US Gulf Coast from New Orleans, Louisiana to Mobile, Alabama. Major natural features in the image are labeled so you can orient yourself:
Because every pixel in that image has a reflectance value, it is information. Python can "see" those values and pick out features the same way we intuitively do by grouping related pixel values. We can colorize pixels based on their relation to each other to simplify the image and view related features. This technique is called classification. Classifying can range from fairly simple groupings based only on some value distribution algorithm derived from the histogram to complex methods involving training data sets and even computer learning and artificial intelligence. The simplest forms are called unsupervised classifications, whereas methods involving some sort of training data to guide the computer are called supervised. It should be noted that classification techniques are used across many fields, from medical doctors trying to spot cancerous cells in a patient's body scan, to casinos using facial-recognition software on security videos to automatically spot known con-artists at blackjack tables.
To introduce remote sensing classification we'll just use the histogram to group pixels with similar colors and intensities and see what we get. First you'll need to download the Landsat 8 scene here:
Instead of our histogram() function from previous examples, we'll use the version included with NumPy that allows you to easily specify a number of bins and returns two arrays with the frequency as well as the ranges of the bin values. We'll use the second array with the ranges as our class definitions for the image. The lut or look-up table is an arbitrary color palette used to assign colors to classes. You can use any colors you want.
import gdalnumeric # Input file name (thermal image) src = "thermal.tif" # Output file name tgt = "classified.jpg" # Load the image into numpy using gdal srcArr = gdalnumeric.LoadFile(src) # Split the histogram into 20 bins as our classes classes = gdalnumeric.numpy.histogram(srcArr, bins=20) # Color look-up table (LUT) - must be len(classes)+1. # Specified as R,G,B tuples lut = [[255,0,0],[191,48,48],[166,0,0],[255,64,64], [255,115,115],[255,116,0],[191,113,48],[255,178,115], [0,153,153],[29,115,115],[0,99,99],[166,75,0], [0,204,0],[51,204,204],[255,150,64],[92,204,204],[38,153,38],
\ [0,133,0],[57,230,57],[103,230,103],[184,138,0]] # Starting value for classification start = 1 # Set up the RGB color JPEG output image rgb = gdalnumeric.numpy.zeros((3, srcArr.shape, srcArr.shape,), gdalnumeric.numpy.float32) # Process all classes and assign colors for i in range(len(classes)): mask = gdalnumeric.numpy.logical_and(start <= \ srcArr, srcArr <= classes[i]) for j in range(len(lut[i])): rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j], \ lut[i][j])) start = classes[i]+1 # Save the image gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), \ tgt, format="JPEG")
The following image is our classification output, which we just saved as a JPEG. We didn't specify the prototype argument when saving as an image, so it has no georeferencing information.
This result isn't bad for a very simple unsupervised classification. The islands and coastal flats show up as different shades of green. The clouds were isolated as shades of orange and dark blues. We did have some confusion inland where the land features were colored the same as the Gulf of Mexico. We could further refine this process by defining the class ranges manually instead of just using the histogram.
eBook Price: $29.99
Book Price: $49.99
Extracting features from images
The ability to classify an image leads us to another remote-sensing capability. Vector GIS data such as shapefiles are typically extracted from remotely-sensed images. Extraction normally involves an analyst clicking around each object in an image and drawing the feature to save it as data. But it is also possible with good remotely-sensed data and proper pre-processing to automatically extract features from an image.
For this example we'll take a subset of our Landsat 8 thermal image to isolate a group of barrier islands as seen in the following screenshot:
You can download this image here:
Our goal with this example is to automatically extract the three islands in the image as a shapefile. But before we can do that, we need to mask out any data we aren't interested in. For example, the water has a wide range of pixel values, as do the islands themselves. If we just want to extract the islands themselves, we need to push all pixel values into just two bins to make the image black and white. This technique is called thresholding . The islands in the image have enough contrast with the water in the background such that thresholding should isolate them nicely.
In the following script we will read the image into an array and then histogram the image using only two bins. We will then use the colors black and white to color the two bins. This script is simply a modified version of our classification script with a very limited output:
import gdalnumeric # Input file name (thermal image) src = "islands.tif" # Output file name tgt = "islands_classified.tiff" # Load the image into numpy using gdal srcArr = gdalnumeric.LoadFile(src) # Split the histogram into 20 bins as our classes classes = gdalnumeric.numpy.histogram(srcArr, bins=2) lut = [[255,0,0],[0,0,0],[255,255,255]] # Starting value for classification start = 1 # Set up the output image rgb = gdalnumeric.numpy.zeros((3, srcArr.shape, srcArr.shape,),
gdalnumeric.numpy.float32) # Process all classes and assign colors for i in range(len(classes)): mask = gdalnumeric.numpy.logical_and(start <= srcArr, srcArr <= classes[i]) for j in range(len(lut[i])): rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j], lut[i][j])) start = classes[i]+1 # Save the image gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8),
tgt, format="GTIFF", prototype=src)
The output looks great as seen in the following screenshot:
The islands are clearly isolated so our extraction script will be able to identify them as polygons and save them to a shapefile. The GDAL library has a method called Polygonize() that does exactly that. It groups all sets of isolated pixels in an image and saves them out as a feature data set. One interesting technique we will use in this script is to use our input image as a mask. The Polygonize() method allows you to specify a mask that will use the color black as a filter that will prevent the water from being extracted as a polygon, so we'll end up with just the islands. Another area to note in the script is that we copy the georeferencing information from our source image to our shapefile to geolocate it properly:
import gdal import ogr, osr # Thresholded input raster name src = "islands_classified.tiff" # Output shapefile name tgt = "extract.shp" # OGR layer name tgtLayer = "extract" # Open the input raster srcDS = gdal.Open(src) # Grab the first band band = srcDS.GetRasterBand(1) # Force gdal to use the band as a mask mask = band # Set up the output shapefile driver = ogr.GetDriverByName("ESRI Shapefile") shp = driver.CreateDataSource(tgt) # Copy the spatial reference srs = osr.SpatialReference() srs.ImportFromWkt(srcDS.GetProjectionRef()) layer = shp.CreateLayer(tgtLayer, srs=srs) # Set up the dbf file fd = ogr.FieldDefn( "DN", ogr.OFTInteger ) layer.CreateField(fd) dst_field = 0 # Automatically extract features from an image! extract = gdal.Polygonize(band, mask, layer, dst_field, , None)
The output shapefile is simply called extract.shp. We'll bring that script back here to look at our shapefile, but we'll add something to it. The largest island has a small lagoon which shows up as a hole in the polygon. In order to properly render it, we have to deal with parts in a shapefile record. The previous example using that script did not do that, so we'll add that piece as we loop through the shapefile features. The code comments in the following code outline the technique:
import shapefile import pngcanvas # Open the extracted islands r = shapefile.Reader("extract.shp") # Setup the world to pixels conversion xdist = r.bbox - r.bbox ydist = r.bbox - r.bbox iwidth = 800 iheight = 600 xratio = iwidth/xdist yratio = iheight/ydist polygons =  # Loop through all shapes for shape in r.shapes(): # Loop through all parts to catch # polygon holes! for i in range(len(shape.parts)): pixels= pt = None if i<len(shape.parts)-1: pt = shape.points[shape.parts[i]:shape.parts[i+1]] else: pt = shape.points[shape.parts[i]:] for x,y in pt: px = int(iwidth - ((r.bbox - x) * xratio)) py = int((r.bbox - y) * yratio) pixels.append([px,py]) polygons.append(pixels) # Set up the output canvas c = pngcanvas.PNGCanvas(iwidth,iheight) # Loop through the polygons and draw them for p in polygons: c.polyline(p) # Save the image f = file("extract.png", "wb") f.write(c.dump()) f.close()
The following screenshot shows our automatically extracted island features! Commercial packages that do this kind of work can easily cost tens of thousands of dollars. While these packages are very robust, it is still fun to see how far you can get with simple Python scripts and a few open source packages. In many cases you can do everything you need to do.
The western-most island contains the polygon hole as shown in the following screenshot, which is zoomed to that area:
The lagoon is not easy to see, but you will find it is if you use the other script.
Automated feature extraction is a holy grail within geospatial analysis because of the cost and tedious effort required to manually extract features. The key to feature extraction is proper image classification. Automated feature extraction works well with water bodies (and islands), roads, farm fields, buildings, and other features that tend to have high-contrast pixel values with their background.
In this article we learned image classification and feature extraction with the help of pure Python.
Resources for Article:
- Python Libraries for Geospatial Development [Article]
- Working with Geo-Spatial Data in Python [Article]
- Web Frameworks for Python Geo-Spatial Development [Article]
eBook Price: $29.99
Book Price: $49.99
About the Author :
Joel Lawhead is a PMI-certified Project Management Professional (PMP) and the Chief Information Officer (CIO) for NVisionSolutions.com, an award-winning firm specializing in geospatial technology integration and sensor engineering.
He began using Python in 1997 and began combining it with geospatial software development in 2000. He has been published in two editions of the Python Cookbook by O'Reilly. He is also the developer of the widely used open source Python Shapefile Library (PyShp) and maintains the geospatial technical blog GeospatialPython.com and Twitter feed @SpatialPython discussing the use of the Python programming language within the geospatial industry.
In 2011, he reverse engineered and published the undocumented shapefile spatial indexing format and assisted fellow geospatial Python developer, Marc Pfister, in reversing the algorithm used, allowing developers around the world to create better-integrated and more robust geospatial applications involving shapefiles.
He has served as the lead architect, project manager, and co-developer for geospatial applications used by US government agencies including NASA, FEMA, NOAA, the US Navy, as well as many commercial and non-profit organizations. In 2002, he received the international "Esri Special Achievment in GIS" award for work on the Real-time Emergency Action Coordination Tool (REACT) for emergency management using geospatial analysis.