This chapter provides an overview of the Python programming language and geospatial development. Please note that this is not a tutorial on how to use the Python language; Python is easy to learn, but the details are beyond the scope of this book.
In this chapter, we will cover:
What the Python programming language is, and how it differs from other languages
An introduction to the Python Standard Library and the Python Package Index
What the terms "geospatial data" and "geospatial development" refer to
An overview of the process of accessing, manipulating, and displaying geospatial data
Some of the major applications for geospatial development
Some of the recent trends in the field of geospatial development
Python (http://python.org) is a modern, high level language suitable for a wide variety of programming tasks. It is often used as a scripting language, automating and simplifying tasks at the operating system level, but it is equally suitable for building large and complex programs. Python has been used to write web-based systems, desktop applications, games, scientific programming, and even utilities and other higher-level parts of various operating systems.
Python supports a wide range of programming idioms, from straightforward procedural programming to object-oriented programming and functional programming.
While Python is generally considered to be an "interpreted" language, and is occasionally criticized for being slow compared to "compiled" languages such as C, the use of byte-compilation and the fact that much of the heavy lifting is done by library code means that Python's performance is often surprisingly good.
Open-source versions of the Python interpreter are freely available for all major operating systems. Python is eminently suitable for all sorts of programming, from quick one-off scripts to building huge and complex systems. It can even be run in interactive (command-line) mode, allowing you to type in commands and immediately see the results. This is ideal for doing quick calculations or figuring out how a particular library works.
One of the first things a developer notices about Python compared with other languages such as Java or C++ is how expressive the language is: what may take 20 or 30 lines of code in Java can often be written in half a dozen lines of code in Python. For example, imagine that you wanted to print a sorted list of the words that occur in a given piece of text. In Python, this is trivial:
words = set(text.split()) for word in sorted(words): print word
Implementing this kind of task in other languages is often surprisingly difficult.
While the Python language itself makes programming quick and easy, allowing you to focus on the task at hand, the Python Standard Libraries make programming even more efficient. These libraries make it easy to do things such as converting date and time values, manipulating strings, downloading data from websites, performing complex maths, working with e-mail messages, encoding and decoding data, XML parsing, data encryption, file manipulation, compressing and decompressing files, working with databases—the list goes on. What you can do with the Python Standard Libraries is truly amazing.
As well as the built-in modules in the Python Standard Libraries, it is easy to download and install custom modules, which can be written in either Python or C.
The Python Package Index (http://pypi.python.org) provides thousands of additional modules which you can download and install. And if that isn't enough, many other systems provide python "bindings" to allow you to access them directly from within your programs. We will be making heavy use of Python bindings in this book.
It should be pointed out that there are different versions of Python available. Python 2.x is the most common version in use today, while the Python developers have been working for the past several years on a completely new, non-backwards-compatible version called Python 3. Eventually, Python 3 will replace Python 2.x, but at this stage most of the third-party libraries (including all the GIS tools we will be using) only work with Python 2.x. For this reason, we won't be using Python 3 in this book.
Python is in many ways an ideal programming language. Once you are familiar with the language itself and have used it a few times, you'll find it incredibly easy to write programs to solve various tasks. Rather than getting buried in a morass of type-definitions and low-level string manipulation, you can simply concentrate on what you want to achieve. You end up almost thinking directly in Python code. Programming in Python is straightforward, efficient, and, dare I say it, fun.
The term "geospatial" refers to information that is located on the earth's surface using coordinates. This can include, for example, the position of a cell phone tower, the shape of a road, or the outline of a country:
Geospatial data often associates some piece of information with a particular location. For example, the following is an interactive map from the http://www.bbc.co.uk/ website, showing the percentage of people in each country with access to the Internet in 2008:
Internally, geospatial data is represented as a series of coordinates, often in the form of latitude and longitude values. Additional attributes such as temperature, soil type, height, or the name of a landmark are also often present. There can be many thousands (or even millions) of data points for a single set of geospatial data. For example, the following outline of New Zealand consists of almost 12,000 individual data points:
Because so much data is involved, it is common to store geospatial information within a database. A large part of this book will be concerned with how to store your geospatial information in a database, and how to access it efficiently.
Geospatial data comes in many different forms. Different Geographical Information System (GIS) vendors have produced their own file formats over the years, and various organizations have also defined their own standards. It is often necessary to use a Python library to read files in the correct format when importing geospatial data into your database.
Unfortunately, not all geospatial data points are compatible. Just like a distance value of 2.8 can have a very different meaning depending on whether you are using kilometers or miles, a given latitude and longitude value can represent any number of different points on the earth's surface, depending on which projection has been used.
A projection is a way of representing the curved surface of the earth in two dimensions. We will look at projections in more detail in Chapter 2, GIS, but for now just keep in mind that every piece of geospatial data has a projection associated with it. To compare or combine two sets of geospatial data, it is often necessary to convert the data from one projection to another.
In addition to the prosaic tasks of importing geospatial data from various external file formats and translating data from one projection to another, geospatial data can also be manipulated to solve various interesting problems. Obvious examples include the task of calculating the distance between two points, or calculating the length of a road, or finding all data points within a given radius of a selected point. We will be using Python libraries to solve all of these problems, and more.
Finally, geospatial data by itself is not very interesting. A long list of coordinates tells you almost nothing; it isn't until those numbers are used to draw a picture that you can make sense of it. Drawing maps, placing data points onto a map, and allowing users to interact with maps are all important aspects of geospatial development. We will be looking at all of these in later chapters.
Let's take a brief look at some of the more common geospatial development tasks you might encounter.
Imagine that you have a database containing a range of geospatial data for San Francisco. This database might include geographical features, roads, the location of prominent buildings, and other man-made features such as bridges, airports, and so on.
Such a database can be a valuable resource for answering various questions. For example:
What's the longest road in Sausalito?
How many bridges are there in Oakland?
What is the total area of the Golden Gate Park?
How far is it from the Pier 39 to the Moscone Center?
Many of these types of problems can be solved using tools such as the PostGIS spatially-enabled database. For example, to calculate the total area of the Golden Gate Park, you might use the following SQL query:
select ST_Area(geometry) from features where name = "Golden Gate Park";
To calculate the distance between two places, you first have to geocode the locations to obtain their latitude and longitude. There are various ways to do this; one simple approach is to use a free geocoding web service, such as this:
This returns a latitude value of
37.82 and a longitude value of
These latitude and longitude values are in decimal degrees. If you don't know what these are, don't worry; we'll talk about decimal degrees in Chapter 2, GIS.
Similarly, we can find the location of the Moscone Center using this query:
This returns a latitude value of
37.80 and a longitude value of
import pyproj lat1,long1 = (37.82,-122.42) lat2,long2 = (37.80,-122.44) geod = pyproj.Geod(ellps="WGS84") angle1,angle2,distance = geod.inv(long1, lat1, long2, lat2) print "Distance is %0.2f meters" % distance
This prints the distance between the two points:
Distance is 2833.64 meters
Don't worry about the "WGS84" reference at this stage; we'll look at what this means in Chapter 2, GIS.
Of course, you wouldn't normally do this sort of analysis on a one-off basis like this—it's much more common to create a Python program that will answer these sorts of questions for any desired set of data. You might, for example, create a web application that displays a menu of available calculations. One of the options in this menu might be to calculate the distance between two points; when this option is selected, the web application would prompt the user to enter the two locations, attempt to geocode them by calling an appropriate web service (and display an error message if a location couldn't be geocoded), then calculate the distance between the two points using Proj, and finally display the results to the user.
Alternatively, if you have a database containing useful geospatial data, you could let the user select the two locations from the database rather than typing in arbitrary location names or street addresses.
However you choose to structure it, performing calculations like this will usually be a major part of your geospatial application.
Imagine that you wanted to see which areas of a city are typically covered by a taxi during an average working day. You might place a GPS recorder into a taxi and leave it to record the taxi's position over several days. The results would be a series of timestamps, latitude and longitude values as follows:
2010-03-21 9:15:23 -38.16614499 176.2336626 2010-03-21 9:15:27 -38.16608632 176.2335635 2010-03-21 9:15:34 -38.16604198 176.2334771 2010-03-21 9:15:39 -38.16601507 176.2333958 ...
By themselves, these raw numbers tell you almost nothing. But when you display this data visually, the numbers start to make sense:
(Street map courtesy of http://openstreetmap.org).
While this is a very simple example, visualization is a crucial aspect of working with geospatial data. How data is displayed visually, how different data sets are overlaid, and how the user can manipulate data directly in a visual format are all going to be major topics of this book.
The concept of a "mash-up" has become popular in recent years. Mash-ups are applications that combine data and functionality from more than one source. For example, a typical mash-up may combine details of houses for rent in a given city, and plot the location of each rental on a map, as follows:
This example comes from http://housingmaps.com.
The Google Maps API has been immensely popular in creating these types of mash-ups. However, Google Maps has some serious licensing and other limitations—as does Google's main competitor, Bing. Fortunately, these are not the only options; tools such as Mapnik, Openlayers, and MapServer, to name a few, also allow you to create mash-ups that overlay your own data onto a map.
Most of these mash-ups run as web applications across the Internet, running on a server that can be accessed by anyone who has a web browser. Sometimes the mash-ups are private, requiring password access, but usually they are publicly available and can be used by anyone. Indeed, many businesses (such as the housing maps site shown in the previous image) are based on freely-available geospatial mash-ups.
A decade ago, geospatial development was vastly more limited than it is today. Professional (and hugely expensive) Geographical Information Systems were the norm for working with and visualizing geospatial data. Open source tools, where they were available, were obscure and hard to use. What is more, everything ran on the desktop—the concept of working with geospatial data across the Internet was no more than a distant dream.
In 2005, Google released two products that completely changed the face of geospatial development. Google Maps and Google Earth made it possible for anyone with a web browser or a desktop computer to view and work with geospatial data. Instead of requiring expert knowledge and years of practice, even a four-year old could instantly view and manipulate interactive maps of the world.
Google's products are not perfect: the map projections are deliberately simplified, leading to errors and problems with displaying overlays; these products are only free for non-commercial use; and they include almost no ability to perform geospatial analysis. Despite these limitations, they have had a huge effect on the field of geospatial development. People became aware of what was possible, and the use of maps and their underlying geospatial data has become so prevalent that even cell phones now commonly include built-in mapping tools.
The Global Positioning System (GPS) has also had a major influence on geospatial development. Geospatial data for streets and other man-made and natural features used to be an expensive and tightly controlled resource, often created by scanning aerial photographs and then manually drawing an outline of a street or coastline over the top to digitize the required features. With the advent of cheap and readily-available portable GPS units, anyone who wishes to can now capture their own geospatial data. Indeed, many people have made a hobby of recording, editing, and improving the accuracy of street and topological data, which are then freely shared across the Internet. All this means that you're not limited to recording your own data, or purchasing data from a commercial organization; volunteered information is now often as accurate and useful as commercially-available data, and may well be suitable for your geospatial application.
The open source software movement has also had a major influence on geospatial development. Instead of relying on commercial toolsets, it is now possible to build complex geospatial applications entirely out of freely-available tools and libraries. Because the source code for these tools is often available, developers can improve and extend these toolkits, fixing problems and adding new features for the benefit of everyone. Tools such as PROJ.4, PostGIS, OGR, and GDAL are all excellent geospatial toolkits which are benefactors of the open source movement. We will be making use of all these tools throughout this book.
As well as standalone tools and libraries, a number of geospatial Application Programming Interfaces (APIs) have become available. Google have provided a number of APIs, which can be used to include maps and perform limited geospatial analysis within a website. Other services, such as the OpenStreetMap geocoder we used earlier, allow you to perform various geospatial tasks that would be difficult to do if you were limited to using your own data and programming resources.
As more and more geospatial data becomes available, from an increasing number of sources, and as the number of tools and systems which can work with this data also increases, it has become increasingly important to define standards for geospatial data. The Open Geospatial Consortium, often abbreviated to OGC (http://www.opengeospatial.org) is an international standards organization which aims to do precisely this: to provide a set of standard formats and protocols for sharing and storing geospatial data. These standards, including GML, KML, GeoRSS, WMS, WFS, and WCS, provide a shared "language" in which geospatial data can be expressed. Tools such as commercial and open source GIS systems, Google Earth, web-based APIs, and specialized geospatial toolkits such as OGR are all able to work with these standards. Indeed, an important aspect of a geospatial toolkit is the ability to understand and translate data between these various formats.
As GPS units have become more ubiquitous, it has become possible to record your location data as you are performing another task. Geolocation, the act of recording your location as you are doing something, is becoming increasingly common. The Twitter social networking service, for example, now allows you to record and display your current location as you enter a status update. As you approach your office, sophisticated To-do list software can now automatically hide any tasks which can't be done at that location. Your phone can also tell you which of your friends are nearby, and search results can be filtered to only show nearby businesses.
All of this is simply the continuation of a trend that started when GIS systems were housed on mainframe computers and operated by specialists who spent years learning about them.
Geospatial data and applications have been "democratized" over the years, making them available in more places, to more people. What was possible only in a large organization can now be done by anyone using a handheld device. As technology continues to improve, and the tools become more powerful, this trend is sure to continue.
In this chapter, we briefly introduced the Python programming language and the main concepts behind geospatial development. We have seen:
That Python is a very high-level language eminently suited to the task of geospatial development.
That there are a number of libraries which can be downloaded to make it easier to perform geospatial development work in Python.
That the term "geospatial data" refers to information that is located on the earth's surface using coordinates.
That the term "geospatial development" refers to the process of writing computer programs that can access, manipulate, and display geospatial data.
That the process of accessing geospatial data is non-trivial, thanks to differing file formats and data standards.
What types of questions can be answered by analyzing geospatial data.
How geospatial data can be used for visualization.
How mash-ups can be used to combine data (often geospatial data) in useful and interesting ways.
How Google Maps, Google Earth, and the development of cheap and portable GPS units have "democratized" geospatial development.
The influence the open source software movement has had on the availability of high quality, freely-available tools for geospatial development.
How various standards organizations have defined formats and protocols for sharing and storing geospatial data.
The increasing use of geolocation to capture and work with geospatial data in surprising and useful ways.
In the next chapter, we will look in more detail at traditional GIS, including a number of important concepts which you need to understand in order to work with geospatial data. Different geospatial formats will be examined, and we will finish by using Python to perform various calculations using geospatial data.