Remote Sensing and Histogram

Joel Lawhead

January 2016

In this article by Joel Lawhead, the author of Learning GeoSpatial Analysis with Python - Second Edition, we will discuss remote sensing. This field grows more exciting every day as more satellites are launched and the distribution of data becomes easier. The high availability of satellite and aerial images as well as the interesting new types of sensors that are being launched each year is changing the role that remote sensing plays in understanding our world.

In this field, Python is quite capable. In remote sensing, we step through each pixel in an image and perform a type of query or mathematical process. An image can be thought of as a large numerical array and in remote sensing, these arrays can be as large as tens of megabytes to several gigabytes. While Python is fast, only C-based libraries can provide the speed that is needed to loop through the arrays at a tolerable speed.

(For more resources related to this topic, see here.)

In this article, whenever possible, we’ll use Python Imaging Library (PIL) for image processing and NumPy, which provides multidimensional array mathematics. While written in C for speed, these libraries are designed for Python and provide Python’s API.

In this article, we’ll start with basic image manipulation and build on each exercise all the way to automatic change detection. Here are the topics that we’ll cover:

  • Swapping image bands
  • Creating image histograms

Swapping image bands

Our eyes can only see colors in the visible spectrum as combinations of red, green, and blue (RGB). Airborne and spaceborne sensors can collect wavelengths of the energy that is outside the visible spectrum. In order to view this data, we move images representing different wavelengths of light reflectance in and out of the RGB channels in order to make color images. These images often end up as bizarre and alien color combinations that can make visual analysis difficult. An example of a typical satellite image is seen in the following Landsat 7 satellite scene near the NASA's Stennis Space Center in Mississippi along the Gulf of Mexico, which is a leading space center for remote sensing and geospatial analysis in general:

Most of the vegetation appears red and water appears almost black. This image is one type of false color image, meaning that the color of the image is not based on the RGB light. However, we can change the order of the bands or swap certain bands in order to create another type of false color image that looks more like the world we are used to seeing. In order to do so, you first need to download this image as a ZIP file from the following:

http://git.io/vqs41

We need to install the GDAL library with Python bindings. The Geospatial Data Abstraction Library(GDAL)includes a module called gdalnumeric that loads and saves remotely sensed images to and from NumPy arrays for easy manipulation. GDAL itself is a data access library and does not provide much in the name of processing. Therefore, in this article, we will rely heavily on NumPy to actually change images.

In this example, we’ll load the image in a NumPy array using gdalnumeric and then, we’ll immediately save it in a new .tiff file. However, upon saving, we’ll use NumPy’s advanced array slicing feature to change the order of the bands. Images in NumPy are multidimensional arrays in the order of band, height, and width. Therefore, an image with three bands will be an array of length three, containing an array for each band, the height, and width of the image. It’s important to note that NumPy references the array locations as y,x (row, column) instead of the usual column and row format that we work with in spreadsheets and other software:

from osgeo import gdalnumeric
src = "FalseColor.tif"
arr = gdalnumeric.LoadFile(src)
gdalnumeric.SaveArray(arr[[1, 0, 2], :], "swap.tif",format="GTiff", prototype=src)

Also in the SaveArray() method, the last argument is called prototype. This argument let’s you specify another image for GDAL from which we can copy spatial reference information and some other image parameters. Without this argument, we’d end up with an image without georeferencing information, which can not be used in a geographical information system (GIS). In this case, we specified our input image file name as the images are identical, except for the band order.

The result of this example produces the swap.tif image, which is a much more visually appealing image with green vegetation and blue water:

There’s only one problem with this image. It’s a little dark and difficult to see. Let’s see if we can figure out why.

Creating histograms

A histogram shows the statistical frequency of data distribution in a dataset. In the case of remote sensing, the dataset is an image, the data distribution is the frequency of the pixels in the range of 0 to 255, which is the range of the 8-byte numbers used to store image information on computers. In an RGB image, color is represented as a three-digit tuple with (0,0,0) being black and (255,255,255) being white. We can graph the histogram of an image with the frequency of each value along the y axis and the range of 255 possible pixel values along the x axis.

We can use the Turtle graphics engine that is included with Python to create a simple GIS. We can also use it to easily graph histograms. Histograms are usually a one-off product that makes a quick script, like this example, great. Also, histograms are typically displayed as a bar graph with the width of the bars representing the size of the grouped data bins. However, in an image, each bin is only one value, so we’ll create a line graph. We’ll use the histogram function in this example and create a red, green, and blue line for each band. The graphing portion of this example also defaults to scaling the y axis values to the maximum RGB frequency that is found in the image. Technically, the y axis represents the maximum frequency, that is, the number of pixels in the image, which would be the case if the image was of one color. We’ll use the turtle module again; however, this example could be easily converted to any graphical output module. However, this format makes the distribution harder to see. Let’s take a look at our swap.tif image:

from osgeo import gdalnumeric
import turtle as t

def histogram(a, bins=list(range(0, 256))):
   fa = a.flat
   n = gdalnumeric.numpy.searchsorted(gdalnumeric.numpy.sort(fa), bins)
   n = gdalnumeric.numpy.concatenate([n, [len(fa)]])
hist = n[1:]-n[:-1]
   return hist

defdraw_histogram(hist, scale=True):
t.color("black")
   axes = ((-355, -200), (355, -200), (-355, -200), (-355, 250))
t.up()
   for p in axes:
t.goto(p)
t.down()
t.up()
t.goto(0, -250)
t.write("VALUE", font=("Arial, ", 12, "bold"))
t.up()
t.goto(-400, 280)
t.write("FREQUENCY", font=("Arial, ", 12, "bold"))
   x = -355
   y = -200
t.up()
   for i in range(1, 11):
       x = x+65
t.goto(x, y)
t.down()
t.goto(x, y-10)
t.up()
t.goto(x, y-25)
t.write("{}".format((i*25)), align="center")
   x = -355
   y = -200
t.up()
   pixels = sum(hist[0])
   if scale:
       max = 0
       for h in hist:
hmax = h.max()
           if hmax> max:
               max = hmax
       pixels = max
   label = pixels/10
   for i in range(1, 11):
       y = y+45
t.goto(x, y)
t.down()
t.goto(x-10, y)
t.up()
t.goto(x-15, y-6)
t.write("{}" .format((i*label)), align="right")
x_ratio = 709.0 / 256
y_ratio = 450.0 / pixels
   colors = ["red", "green", "blue"]
   for j in range(len(hist)):
       h = hist[j]
       x = -354
       y = -199
t.up()
t.goto(x, y)
t.down()
t.color(colors[j])
   for i in range(256):
       x = i * x_ratio
       y = h[i] * y_ratio
       x = x - (709/2)
       y = y + -199
t.goto((x, y))
im = "swap.tif"
histograms = []
arr = gdalnumeric.LoadFile(im)
for b in arr:
histograms.append(histogram(b))
draw_histogram(histograms)
t.pen(shown=False)
t.done()

Here's what the histogram for swap.tif looks similar to after running the example:

As you can see, all the three bands are grouped closely towards the left-hand side of the graph and all have values that are less than 125. As these values approach zero, the image becomes darker, which is not surprising. Just for fun, let’s run the script again and when we call the draw_histogram() function, we’ll add the scale=False option to get an idea of the size of the image and provide an absolute scale. Therefore, we take the following line:

draw_histogram(histograms)

Change it to the following:

draw_histogram(histograms, scale=False)

This change will produce the following histogram graph:

As you can see, it’s harder to see the details of the value distribution. However, this absolute scale approach is useful if you are comparing multiple histograms from different products that are produced from the same source image.

Now that we understand the basics of looking at an image statistically using histograms, how do we make our image brighter?

Performing a histogram stretch

A histogram stretch operation does exactly what the name suggests. It distributes the pixel values across the whole scale. By doing so, we have more values at the higher-intensity level and the image becomes brighter. Therefore, in this example, we’ll use our histogram function; however, we’ll add another function called stretch() that takes an image array, creates the histogram, and then spreads out the range of values for each band. We’ll run these functions on swap.tif and save the result in an image called stretched.tif:

import gdalnumeric
import operator
from functools import reduce

def histogram(a, bins=list(range(0, 256))):
   fa = a.flat
   n = gdalnumeric.numpy.searchsorted(gdalnumeric.numpy.sort(fa), bins)
   n = gdalnumeric.numpy.concatenate([n, [len(fa)]])
hist = n[1:]-n[:-1]
   return hist

def stretch(a):
hist = histogram(a)
lut = []
   for b in range(0, len(hist), 256):
       step = reduce(operator.add, hist[b:b+256]) / 255
       n = 0
       for i in range(256):
lut.append(n / step)
          n = n + hist[i+b]
gdalnumeric.numpy.take(lut, a, out=a)
   return a
src = "swap.tif"
arr = gdalnumeric.LoadFile(src)
stretched = stretch(arr)
gdalnumeric.SaveArray(arr, "stretched.tif", format="GTiff", prototype=src)

The stretch algorithm will produce the following image. Look how much brighter and visually appealing it is:

We can run our turtle graphics histogram script on stretched.tif by changing the filename in the im variable to stretched.tif:

im = "stretched.tif"

This run will give us the following histogram:

As you can see, all the three bands are distributed evenly now. Their relative distribution to each other is the same; however, in the image, they are now spread across the spectrum.

Summary

In this article, we covered the foundations of remote sensing including band swapping and histograms.

The authors of GDAL have a set of Python examples, covering some advanced topics that may be of interest, available at https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning Geospatial Analysis with Python - Second Edition

Explore Title