Reader small image

You're reading from  Mastering Geospatial Analysis with Python

Product typeBook
Published inApr 2018
Reading LevelBeginner
PublisherPackt
ISBN-139781788293334
Edition1st Edition
Languages
Right arrow
Authors (3):
Silas Toms
Silas Toms
author image
Silas Toms

Silas Toms is a long-time geospatial professional and author who has previously published ArcPy and ArcGIS and Mastering Geospatial Analysis with Python. His career highlights include developing the real-time common operational picture used at Super Bowl 50, building geospatial software for autonomous cars, designing computer vision for next-gen insurance, and developing mapping systems for Zillow. He now works at Volta Charging, predicting the future of electric vehicle adoption and electric charging infrastructure.
Read more about Silas Toms

Paul Crickard
Paul Crickard
author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard

Eric van Rees
Eric van Rees
author image
Eric van Rees

Eric van Rees was first introduced to Geographical Information Systems (GIS) when studying Human Geography in the Netherlands. For 9 years, he was the editor-in-chief of GeoInformatics, an international GIS, surveying, and mapping publication and a contributing editor of GIS Magazine. During that tenure, he visited many geospatial user conferences, trade fairs, and industry meetings. He focuses on producing technical content, such as software tutorials, tech blogs, and innovative new use cases in the mapping industry.
Read more about Eric van Rees

View More author details
Right arrow

Chapter 2. Introduction to Geospatial Code Libraries

This chapter will introduce the major code libraries used to process and analyze geospatial data. You will learn the characteristics of each library, how they are related to each other, how to install them, where to find additional documentation, and typical use cases. These instructions assume that the user has a recent (2.7 or later) version of Python on their machine, and do not cover installing Python. Next, we'll discuss how all of these packages fit together and how they are covered in the rest of this book.

The following libraries will be covered in this chapter:

  • GDAL/OGR
  • GEOS
  • Shapely
  • Fiona
  • Python Shapefile Library (pyshp)
  • pyproj
  • Rasterio
  • GeoPandas

Geospatial Data Abstraction Library (GDAL) and the OGR Simple Features Library


The Geospatial Data Abstraction Library (GDAL)/OGR Simple Features Library combines two separate libraries that are generally downloaded together as a GDAL. This means that installing the GDAL package also gives access to OGR functionality, which is why they're covered together here. The reason GDAL is covered first is that other packages were written after GDAL, so chronologically, it comes first. As you will notice, some of the packages covered in this chapter extend GDAL's functionality or use it under the hood.

GDAL was created in the 1990s by Frank Warmerdam and saw its first release in June 2000. Later, the development of GDAL was transferred to the Open Source Geospatial Foundation (OSGeo). Technically, GDAL is a little different than your average Python package as the GDAL package itself was written in C and C++, meaning that in order to be able to use it in Python, you need to compile GDAL and its associated...

GEOS


The Geometry Engine Open Source (GEOS) is the C/C++ port of a subset of the Java Topology Suite (JTS) and selected functions. GEOS aims to contain the complete functionality of JTS in C++. It can be compiled on many platforms, including Python. As you will see later on, the Shapely library uses functions from the GEOS library. In fact, there are many applications using GEOS, including PostGIS and QGIS. GeoDjango, covered in Chapter 12GeoDjango, also uses GEOS, as well as GDAL, among other geospatial libraries. GEOS can also be compiled with GDAL, giving OGR all of its capabilities.

The JTS is an open source geospatial computational geometry library written in Java. It provides various functionalities, including a geometry model, geometric functions, spatial structures and algorithms, and i/o capabilities. Using GEOS, you have access to the following capabilities—geospatial functions (such as within and contains), geospatial operations (union, intersection, and many more), spatial indexing...

Shapely


Shapely is a Python package for manipulation and analysis of planar features, using functions from the GEOS library (the engine of PostGIS) and a port of the JTS. Shapely is not concerned with data formats or coordinate systems but can be readily integrated with packages that are. Shapely only deals with analyzing geometries and offers no capabilities for reading and writing geospatial files. It was developed by Sean Gillies, who was also the person behind Fiona and Rasterio.

Shapely supports eight fundamental geometry types that are implemented as a class in the shapely.geometry module—points, multipoints, linestrings, multilinestrings, linearrings, multipolygons, polygons, and geometrycollections. Apart from representing these geometries, Shapely can be used to manipulate and analyze geometries through a number of methods and attributes.

Shapely has mainly the same classes and functions as OGR while dealing with geometries. The difference between Shapely and OGR is that Shapely has...

Fiona


Fiona is the API of OGR. It can be used for reading and writing data formats. The main reason for using it instead of OGR is that it's closer to Python than OGR as well as more dependable and less error-prone. It makes use of two markup languages, WKT and WKB, for representing spatial information with regards to vector data. As such, it can be combined well with other Python libraries such as Shapely, you would use Fiona for input and output, and Shapely for creating and manipulating geospatial data. 

While Fiona is Python compatible and our recommendation, users should also be aware of some of the disadvantages. It is more dependable than OGR because it uses Python objects for copying vector data instead of C pointers, which also means that they use more memory, which affects the performance. 

Installing Fiona

You can use pip install, conda, or Anaconda3 to install Fiona:

>> conda install -c conda-forge fiona
>> conda install -c conda-forge/label/broken fiona
>>pip install...

Python shapefile library (pyshp)


 The Python shapefile library (pyshp) is a pure Python library and is used to read and write shapefiles. The pyshp library's sole purpose is to work with shapefiles—it only uses the Python standard library. You cannot use it for geometric operations. If you're only working with shapefiles, this one-file-only library is simpler than using GDAL.

Installing pyshp

You can use pip install, conda, and Anaconda3 to install pyshp:

>> pip install pyshp
>>conda install pyshp

More documentation is available at PyPi: https://pypi.python.org/pypi/pyshp/1.2.3

The source code for pyshp is available at https://github.com/GeospatialPython/pyshp.

pyproj


The pyproj is a Python package that performs cartographic transformations and geodetic computations. It is a Cython wrapper to provide Python interfaces to PROJ.4 functions, meaning you can access an existing library of C code in Python.

PROJ.4 is a projection library that transforms data among many coordinate systems and is also available through GDAL and OGR. The reason that PROJ.4 is still popular and widely used is two-fold:

  • Firstly, because it supports so many different coordinate systems
  • Secondly, because of the routes it provides to do this—Rasterio and GeoPandas, two Python libraries covered next, both use pyproj and thus PROJ.4 functionality under the hood

The difference between using PROJ.4 separately instead of using it with a package such as GDAL is that it enables you to re-project individual points, and packages using PROJ.4 do not offer this functionality.

The pyproj package offers two classes—the Proj class and the Geod class. The Proj class performs cartographic computations...

Rasterio


Rasterio is a GDAL and NumPy-based Python library for raster data, written with the Python developer in mind instead of C, using Python language types, protocols, and idioms. Rasterio aims to make GIS data more accessible to Python programmers and helps GIS analysts learn important Python standards. Rasterio relies on concepts of Python rather than GIS.

Rasterio is an open source project from the satellite team of Mapbox, a provider of custom online maps for websites and applications. The name of this library should be pronounced as raster-i-o rather than ras-te-rio. Rasterio came into being as a result of a project called the Mapbox Cloudless Atlas, which aimed to create a pretty-looking basemap from satellite imagery.

One of the software requirements was to use open source software and a high-level language with handy multi-dimensional array syntax. Although GDAL offers proven algorithms and drivers, developing with GDAL's Python bindings feels a lot like C++.

Therefore, Rasterio...

GeoPandas


GeoPandas is a Python library for working with vector data. It is based on the pandas library that is part of the SciPy stack. SciPy is a popular library for data inspection and analysis, but unfortunately, it cannot read spatial data. GeoPandas was created to fill this gap, taking pandas data objects as a starting point. The library also adds functionality from geographical Python packages. 

GeoPandas offers two data objects—a GeoSeries object that is based on a pandas Series object and a GeoDataFrame, based on a pandas DataFrame object, but adding a geometry column for each row. Both GeoSeries and GeoDataFrame objects can be used for spatial data processing, similar to spatial databases. Read and write functionality is provided for almost every vector data format. Also, because both Series and DataFrame objects are subclasses from pandas data objects, you can use the same properties to select or subset data, for example .loc or .iloc.

GeoPandas is a library that employs the capabilities...

How it all works together


We've provided an overview of the most important open source packages for processing and analyzing geospatial data. The question then becomes when to use a certain package and why. GDAL, OGR, and GEOS are indispensable for geospatial processing and analyzing, but were not written in Python, and so they require Python binaries for Python developers. Fiona, Shapely, and pyproj were written to solve these problems, as well as the newer Rasterio library. For a more Pythonic approach, these newer packages are preferable to the older C++ packages with Python binaries (although they're used under the hood).

However, it's good to know the origins and history of all of these packages as they're all so widely used (and for good reason). The next chapter, which will be discussing geospatial databases, will build on information from this chapter. Chapter 5Vector Data Analysis, and Chapter 6Raster Data Processing, will specifically deal with the libraries discussed here,...

Summary


In this chapter, we introduced the major code libraries used to process and analyze geospatial data. You learned the characteristics of each library, how they are related or are distinct to each other, how to install them, where to find additional documentation, and typical use cases. GDAL is a major library that includes two separate libraries, OGR and GDAL. Many other libraries and software applications use GDAL functionality under the hood, examples are Fiona and Rasterio, which were both covered in this chapter. These were created to make it easier to work with GDAL and OGR in a more Pythonic way.

The next chapter will introduce spatial databases. These are used for data storage and analysis, with examples being SpatiaLite and PostGIS. You will also learn how to use different Python libraries to connect to these databases. 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Geospatial Analysis with Python
Published in: Apr 2018Publisher: PacktISBN-13: 9781788293334
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Silas Toms

Silas Toms is a long-time geospatial professional and author who has previously published ArcPy and ArcGIS and Mastering Geospatial Analysis with Python. His career highlights include developing the real-time common operational picture used at Super Bowl 50, building geospatial software for autonomous cars, designing computer vision for next-gen insurance, and developing mapping systems for Zillow. He now works at Volta Charging, predicting the future of electric vehicle adoption and electric charging infrastructure.
Read more about Silas Toms

author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard

author image
Eric van Rees

Eric van Rees was first introduced to Geographical Information Systems (GIS) when studying Human Geography in the Netherlands. For 9 years, he was the editor-in-chief of GeoInformatics, an international GIS, surveying, and mapping publication and a contributing editor of GIS Magazine. During that tenure, he visited many geospatial user conferences, trade fairs, and industry meetings. He focuses on producing technical content, such as software tutorials, tech blogs, and innovative new use cases in the mapping industry.
Read more about Eric van Rees