Image classification and feature extraction from images

Learning Geospatial Analysis with Python

\$29.99

If you know Python and would like to use it for Geospatial Analysis this book is exactly what you’ve been looking for. With an organized, user-friendly approach it covers all the bases to give you the necessary skills and know-how.

(For more resources related to this topic, see here.)

Classifying images

Automated Remote Sensing ( ARS ) is rarely ever done in the visible spectrum. The most commonly available wavelengths outside of the visible spectrum are infrared and near-infrared. The following scene is a thermal image (band 10) from a fairly recent Landsat 8 flyover of the US Gulf Coast from New Orleans, Louisiana to Mobile, Alabama. Major natural features in the image are labeled so you can orient yourself:

Because every pixel in that image has a reflectance value, it is information. Python can "see" those values and pick out features the same way we intuitively do by grouping related pixel values. We can colorize pixels based on their relation to each other to simplify the image and view related features. This technique is called classification. Classifying can range from fairly simple groupings based only on some value distribution algorithm derived from the histogram to complex methods involving training data sets and even computer learning and artificial intelligence. The simplest forms are called unsupervised classifications, whereas methods involving some sort of training data to guide the computer are called supervised. It should be noted that classification techniques are used across many fields, from medical doctors trying to spot cancerous cells in a patient's body scan, to casinos using facial-recognition software on security videos to automatically spot known con-artists at blackjack tables.

To introduce remote sensing classification we'll just use the histogram to group pixels with similar colors and intensities and see what we get. First you'll need to download the Landsat 8 scene here:

Instead of our histogram() function from previous examples, we'll use the version included with NumPy that allows you to easily specify a number of bins and returns two arrays with the frequency as well as the ranges of the bin values. We'll use the second array with the ranges as our class definitions for the image. The lut or look-up table is an arbitrary color palette used to assign colors to classes. You can use any colors you want.

```import gdalnumeric

# Input file name (thermal image)
src = "thermal.tif"

# Output file name
tgt = "classified.jpg"

# Load the image into numpy using gdal

# Split the histogram into 20 bins as our classes
classes = gdalnumeric.numpy.histogram(srcArr, bins=20)[1]

# Color look-up table (LUT) - must be len(classes)+1.
# Specified as R,G,B tuples
lut = [[255,0,0],[191,48,48],[166,0,0],[255,64,64],
[255,115,115],[255,116,0],[191,113,48],[255,178,115],
[0,153,153],[29,115,115],[0,99,99],[166,75,0],
[0,204,0],[51,204,204],[255,150,64],[92,204,204],[38,153,38],\ [0,133,0],[57,230,57],[103,230,103],[184,138,0]]

# Starting value for classification
start = 1

# Set up the RGB color JPEG output image
rgb = gdalnumeric.numpy.zeros((3, srcArr.shape[0],
srcArr.shape[1],), gdalnumeric.numpy.float32)

# Process all classes and assign colors
for i in range(len(classes)):
srcArr, srcArr <= classes[i])
for j in range(len(lut[i])):
rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j], \ lut[i][j]))
start = classes[i]+1

# Save the image
gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), \
tgt, format="JPEG")```

The following image is our classification output, which we just saved as a JPEG. We didn't specify the prototype argument when saving as an image, so it has no georeferencing information.

This result isn't bad for a very simple unsupervised classification. The islands and coastal flats show up as different shades of green. The clouds were isolated as shades of orange and dark blues. We did have some confusion inland where the land features were colored the same as the Gulf of Mexico. We could further refine this process by defining the class ranges manually instead of just using the histogram.

Extracting features from images

The ability to classify an image leads us to another remote-sensing capability. Vector GIS data such as shapefiles are typically extracted from remotely-sensed images. Extraction normally involves an analyst clicking around each object in an image and drawing the feature to save it as data. But it is also possible with good remotely-sensed data and proper pre-processing to automatically extract features from an image.

For this example we'll take a subset of our Landsat 8 thermal image to isolate a group of barrier islands as seen in the following screenshot:

Our goal with this example is to automatically extract the three islands in the image as a shapefile. But before we can do that, we need to mask out any data we aren't interested in. For example, the water has a wide range of pixel values, as do the islands themselves. If we just want to extract the islands themselves, we need to push all pixel values into just two bins to make the image black and white. This technique is called thresholding . The islands in the image have enough contrast with the water in the background such that thresholding should isolate them nicely.

In the following script we will read the image into an array and then histogram the image using only two bins. We will then use the colors black and white to color the two bins. This script is simply a modified version of our classification script with a very limited output:

```import gdalnumeric

# Input file name (thermal image)
src = "islands.tif"

# Output file name
tgt = "islands_classified.tiff"

# Load the image into numpy using gdal

# Split the histogram into 20 bins as our classes
classes = gdalnumeric.numpy.histogram(srcArr, bins=2)[1]

lut = [[255,0,0],[0,0,0],[255,255,255]]

# Starting value for classification
start = 1

# Set up the output image
rgb = gdalnumeric.numpy.zeros((3, srcArr.shape[0], srcArr.shape[1],), gdalnumeric.numpy.float32)

# Process all classes and assign colors
for i in range(len(classes)):
mask = gdalnumeric.numpy.logical_and(start <= srcArr, srcArr <= classes[i])
for j in range(len(lut[i])):
start = classes[i]+1

# Save the image
gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), tgt, format="GTIFF", prototype=src)```

The output looks great as seen in the following screenshot:

The islands are clearly isolated so our extraction script will be able to identify them as polygons and save them to a shapefile. The GDAL library has a method called Polygonize() that does exactly that. It groups all sets of isolated pixels in an image and saves them out as a feature data set. One interesting technique we will use in this script is to use our input image as a mask. The Polygonize() method allows you to specify a mask that will use the color black as a filter that will prevent the water from being extracted as a polygon, so we'll end up with just the islands. Another area to note in the script is that we copy the georeferencing information from our source image to our shapefile to geolocate it properly:

```import gdal
import ogr, osr

# Thresholded input raster name
src = "islands_classified.tiff"
# Output shapefile name
tgt = "extract.shp"
# OGR layer name
tgtLayer = "extract"
# Open the input raster
srcDS = gdal.Open(src)
# Grab the first band
band = srcDS.GetRasterBand(1)
# Force gdal to use the band as a mask
# Set up the output shapefile
driver = ogr.GetDriverByName("ESRI Shapefile")
shp = driver.CreateDataSource(tgt)
# Copy the spatial reference
srs = osr.SpatialReference()
srs.ImportFromWkt(srcDS.GetProjectionRef())
layer = shp.CreateLayer(tgtLayer, srs=srs)
# Set up the dbf file
fd = ogr.FieldDefn( "DN", ogr.OFTInteger )
layer.CreateField(fd)
dst_field = 0
# Automatically extract features from an image!
extract = gdal.Polygonize(band, mask, layer, dst_field, [], None)```

The output shapefile is simply called extract.shp. We'll bring that script back here to look at our shapefile, but we'll add something to it. The largest island has a small lagoon which shows up as a hole in the polygon. In order to properly render it, we have to deal with parts in a shapefile record. The previous example using that script did not do that, so we'll add that piece as we loop through the shapefile features. The code comments in the following code outline the technique:

```import shapefile
import pngcanvas
# Open the extracted islands
# Setup the world to pixels conversion
xdist = r.bbox[2] - r.bbox[0]
ydist = r.bbox[3] - r.bbox[1]
iwidth = 800
iheight = 600
xratio = iwidth/xdist
yratio = iheight/ydist
polygons = []
# Loop through all shapes
for shape in r.shapes():
# Loop through all parts to catch
# polygon holes!
for i in range(len(shape.parts)):
pixels=[]
pt = None
if i<len(shape.parts)-1:
pt = shape.points[shape.parts[i]:shape.parts[i+1]]
else:
pt = shape.points[shape.parts[i]:]
for x,y in pt:
px = int(iwidth - ((r.bbox[2] - x) * xratio))
py = int((r.bbox[3] - y) * yratio)
pixels.append([px,py])
polygons.append(pixels)
# Set up the output canvas
c = pngcanvas.PNGCanvas(iwidth,iheight)
# Loop through the polygons and draw them
for p in polygons:
c.polyline(p)
# Save the image
f = file("extract.png", "wb")
f.write(c.dump())
f.close()```

The following screenshot shows our automatically extracted island features! Commercial packages that do this kind of work can easily cost tens of thousands of dollars. While these packages are very robust, it is still fun to see how far you can get with simple Python scripts and a few open source packages. In many cases you can do everything you need to do.

The western-most island contains the polygon hole as shown in the following screenshot, which is zoomed to that area:

The lagoon is not easy to see, but you will find it is if you use the other script.

Automated feature extraction is a holy grail within geospatial analysis because of the cost and tedious effort required to manually extract features. The key to feature extraction is proper image classification. Automated feature extraction works well with water bodies (and islands), roads, farm fields, buildings, and other features that tend to have high-contrast pixel values with their background.

Summary

In this article we learned image classification and feature extraction with the help of pure Python.

Resources for Article:

Further resources on this subject:

Books to Consider

Oracle Identity and Access Manager 11g for Administrators
\$ 35.99
ImageMagick Tricks
\$ 10.50