Mastering Geospatial Analysis with Python

4.8 (5 reviews total)
By Paul Crickard , Eric van Rees , Silas Toms
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Package Installation and Management

About this book

Python comes with a host of open source libraries and tools that help you work on professional geoprocessing tasks without investing in expensive tools. This book will introduce Python developers, both new and experienced, to a variety of new code libraries that have been developed to perform geospatial analysis, statistical analysis, and data management. This book will use examples and code snippets that will help explain how Python 3 differs from Python 2, and how these new code libraries can be used to solve age-old problems in geospatial analysis.

You will begin by understanding what geoprocessing is and explore the tools and libraries that Python 3 offers. You will then learn to use Python code libraries to read and write geospatial data. You will then learn to perform geospatial queries within databases and learn PyQGIS to automate analysis within the QGIS mapping suite. Moving forward, you will explore the newly released ArcGIS API for Python and ArcGIS Online to perform geospatial analysis and create ArcGIS Online web maps. Further, you will deep dive into Python Geospatial web frameworks and learn to create a geospatial REST API.

Publication date:
April 2018
Publisher
Packt
Pages
440
ISBN
9781788293334

 

Chapter 1. Package Installation and Management

This book focuses on important code libraries for geospatial data management and analysis for Python 3. The reason for this is simple—as Python 2 is near the end of its life cycle, it is quickly being replaced by Python 3. This new Python version comes with key differences in organization and syntax, meaning that developers need to adjust their legacy code and apply new syntax in their code. Fields such as machine learning, data science, and big data have changed the way geospatial data is managed, analyzed, and presented today. In all these areas, Python 3 has quickly become the new standard, which is another reason for the geospatial community to start using Python 3.

The geospatial community has been relying on Python 2 for a long time, as many dependencies weren't available for Python 3 or not working correctly. But now that Python 3 is mature and stable, the geospatial community has taken advantage of its capabilities, resulting in many new libraries and tools. This book aims to help developers understand open source and commercial modules for geospatial programs written in Python 3, offering a selection of major geospatial libraries and tools for doing geospatial data management and data analysis.

This chapter will explain how to install and manage the code libraries that will be used in this book. It will cover the following topics:

  • Installing Anaconda
  • Managing Python packages using Anaconda Navigator, Anaconda Cloud, conda, and pip
  • Managing virtual environments using Anaconda, conda, and virtualenv
  • Running a Jupyter Notebook
 

Introducing Anaconda


Anaconda is a freemium open source distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. It is also the world's most popular Python data science platform, with over 4.5 million users and 1,000 data science packages. It is not to be confused with conda, a package manager that is installed with Anaconda.

For this book, we recommend installing and using Anaconda as it provides you everything you need—Python itself, Python libraries, the tools to manage these libraries, a Python environment manager, and the Jupyter Notebook application to write, edit, and run your code. You can also choose to use an alternative to Anaconda or install Python through www.python.org/downloads and use any IDE of your choice combined with a package manager such as pip (covered as we proceed further). We recommend using Python version 3.6.

Installing Python using Anaconda

A free download of the latest version of Anaconda, available for Windows, macOS, and Linux is available at the homepage of Continuum Analytics. At the time of writing, the latest version is Anaconda 5.0.1, released in October 2017 and available in 32 and 64-bit versions from https://www.continuum.io/downloads. This page also offers extensive download instructions for each operating system, a 30-minute tutorial that explains how to use Anaconda, a cheat sheet on how to get started, and an FAQ section. There's also a slimmed-down version of Anaconda called Miniconda that only installs Python and the conda package manager, leaving out the 1000+ software packages that come with the standard installation of Anaconda: https://conda.io/miniconda.html. If you decide to use this, make sure you download the Python 3.6 version.

Anaconda will install Python 3.6.2 as the default Python version on your machine. The Python version that is used in all chapters of this book is Python 3.6, so you're good with any version that starts with 3.6 or higher. With Anaconda, you get more than 1,000 Python packages, as well as a number of applications, such as Jupyter Notebook, and a variety of Python consoles and IDEs.

Please note that you are not forced to always use Python version 3.6 after installing it—using Anaconda Navigator (a GUI for managing local environments and installing packages), you can also choose to use Python 3.5 or 2.7 in a virtual environment. This gives you more flexibility in switching between different Python versions for various projects.

To begin the installation, download the 32-or 64-bit Anaconda installer, depending on your system capabilities. Open the installation and follow the setup guide to install Anaconda on your local system.

 

Running a Jupyter Notebook


Jupyter Notebooks are a novel idea, which has been adopted by many companies (including Esri and the new ArcGIS API for Python). Managed by Project Jupyter, the open source project (which is based on IPython, an earlier interactive code environment), is a fantastic tool for both learning and production environments. While the code can also be run as a script, as seen in other chapters, using the Jupyter Notebooks will make coding even more fun.

The idea of the code Notebooks is to make coding interactive. By combining a Python terminal with direct output that results from the code being run, the Notebooks (which are saveable) become a tool for sharing and comparing code. Each section can be edited later or can be saved as a separate component for demonstration purposes.

Note

Check out the documentation for Jupyter Notebooks here:http://jupyter.org/documentation.

Running a Notebook

To start the local server that powers the Notebooks, activate the virtual environment and pass the jupyter notebook command:

C:\PythonGeospatial3>cartoenv\Scripts\activate
(cartoenv) C:\PythonGeospatial3>jupyter notebook
[I 17:30:46.338 NotebookApp] Serving notebooks from local directory: C:\PythonGeospatial3
[I 17:30:46.338 NotebookApp] 0 active kernels
[I 17:30:46.339 NotebookApp] The Jupyter Notebook is running at:
[I 17:30:46.339 NotebookApp] http://localhost:8888/?token=5376ed8c704d0ead295a3c0464e52664e367094a9e74f70e
[I 17:30:46.339 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:30:46.344 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=5376ed8c704d0ead295a3c0464e52664e367094a9e74f70e
[I 17:30:46.450 NotebookApp] Accepting one-time-token-authenticated connection from ::1
[I 17:30:49.490 NotebookApp] Kernel started: 802159ef-3215-4b23-b77f-4715e574f09b
[I 17:30:50.532 NotebookApp] Adapting to protocol v5.1 for kernel 802159ef-3215-4b23-b77f-4715e574f09b

This will start running the server that will power the Notebooks. This local server can be accessed on port 8888, using a browser, by navigating to: http://localhost:8888. It should automatically open a tab like this one when started:

If you log out, use the token provided in the text generated when the jupyter notebook command is passed to log back in, as in this example:

http://localhost:8888/?token=5376ed8c704d0ead295a3c0464e52664e367094a9e74f70e

Creating a new Notebook

To create a new Notebook, click on the New button in the upper-right, and select Python 3 from the Notebook section. It will open the Notebook in a new tab:

Adding code

In Jupyter Notebooks, code is added in the In sections. The code can be added line by line, as the code variables and imported modules will be saved in memory, or it can be added in blocks/multiple lines, like a script. The In sections can be edited and run over and over, or they can be left alone, and a new section can be started. This creates a record of the scripting efforts, along with the interactive output.

Note

Here is a GIST explaining lots of useful keyboard shortcuts for Jupyter Notebooks:https://gist.github.com/kidpixo/f4318f8c8143adee5b40 

 

Managing Python packages


After installing Anaconda, it's time to discuss how to manage different Python packages. Anaconda offers several options to do this—Anaconda Navigator, Anaconda Cloud, and the conda package manager.

Managing packages with Anaconda Navigator

After installing Anaconda, you will notice a working folder with various applications inside of it. One of these is Anaconda Navigator, which provides a Graphical User Interface (GUI). You can compare it to Windows File Explorer, that is, an environment to manage projects, packages, and environments. The term environment refers to a collection of packages and a Python install. Notice that this is similar to how you would use virtualenv, but this time using a GUI instead of a command prompt to create one (virtualenv is covered in more detail later in this chapter).

After opening Anaconda Navigator, click the Environments tab on the left of the screen and Anaconda Navigator will provide an overview of existing environments and the packages it contains. There's one pre-defined environment available, a so-called root environment that provides you with 150+ pre-installed Python packages. New environments can be made by clicking the Create button on the bottom of the screen. This will automatically install five default Python packages, including pip, which means you're free to use that too for package management. What's interesting about Anaconda Navigator is that, with every new environment, you can choose a preferred Python version and install from a list of 1000+ packages that are available locally if you installed the default Anaconda version and not Miniconda. This list is available by selecting the option Not Installed from the drop-down menu next to the Channels button. You can easily search and select the packages of your choice by using the Search Packages field and hitting Enter. Mark the packages and install them for the environment of your choice. After installation, the package will be listed by name in the environment. If you click the green box with a checkmark next to the package name, you can choose to mark a package for an upgrade, removal, or specific version installation.

After installing the packages, you can start working with an environment by opening up a terminal, Jupyter Notebook, or another Anaconda application with one mouse click on the arrow button inside of the environment of your choice. If you wish to use an IDE instead of one of the options that Anaconda Navigator offers you, be sure to redirect your IDE to the right python.exe file that is used by Anaconda. This file can usually be found at the following path, which is the default installation path of Anaconda:

C:\Users\<UserName>\Anaconda3\python.exe.

Online searching for packages using Anaconda Cloud

If you are searching for a Python package that is not found in the local list of available Python packages, you can use Anaconda Cloud. This application is also part of Anaconda3 and you can use the Anaconda Cloud application for sharing packages, Notebooks, and environments with others. After clicking on the Anaconda Cloud desktop icon, an internet page will open where you can sign up to become a registered user. Anaconda Cloud is similar to GitHub, as it lets you create a private online repository for your own work. These repositories are called channels.

If you create a user account, you can use Anaconda Cloud from inside Anaconda Navigator. After creating a user account for Anaconda Cloud, open Anaconda Navigator and use your login details to sign into Anaconda Cloud in the upper-right corner of the screen where it says Sign in to Anaconda Cloud. Now, you can upload your own packages and files to a private package repository and search for existing files or packages. 

Managing Python packages with conda

Apart from using Anaconda Navigator and Cloud for package management, you can use conda, a binary package manager, as a command-line tool to manage your package installations. conda quickly installs, runs, and updates packages and their dependencies. conda easily creates, saves, loads, and switches between environments on your local computer. The best ways to install conda are through installing either Anaconda or Miniconda. A third option is a separate installation through Python Package Index (PyPI), but may not be up-to-date so this option is not recommended.

Installing packages with conda is straightforward, as it resembles the syntax of pip. However, it is good to know that conda cannot install packages directly from a Git server. This means that the latest version of many packages under development cannot be downloaded with conda. Also, conda doesn't cover all the packages available on PyPI as pip does itself, which is why you always have access to pip when creating a new environment with Anaconda Navigator (more on pip as we proceed further).

You can verify if conda is installed by typing the following command in a terminal:

>> conda -version

If installed, conda will display the number of the version that you have installed. Installing the package of your choice can be done with the following command in a terminal:

>> conda install <package-name>

Updating an already installed package to its latest available version can be done as follows:

>> conda update <package-name>

You can also install a particular version of a package by pointing out the version number:

>> conda install <package-name>=1.2.0

You can update all the available packages simply by using the --all argument:

>> conda update --all

You can uninstall packages too:

>> conda remove <package-name>

Extensive conda documentation is available at: https://conda.io/docs/index.html.

Managing Python packages using pip

As stated earlier, Anaconda users always have pip available in every new environment, as well as the root folder—it comes pre-installed with every version of Anaconda, including Miniconda. As pip is a Python package manager used to install and manage software packages written in Python, it runs in the command line, as opposed to Anaconda Navigator and Cloud. If you decide not to use Anaconda or anything similar to it, and use a default Python installation from python.org, you can either use easy_install or pip as a package manager. As pip is seen as an improvement over easy_install and the preferred Python package manager for Python 3, we will only discuss pip here. It is recommended to use either pip, conda, Anaconda Navigator, or Cloud for Python package management in the upcoming chapters.

Optionally, as you install Anaconda, three environment variables will be added to your list of user variables. This enables you to access commands such as pip from any system location if you open a terminal. To check if pip is installed on your system, open a terminal and enter:

>> pip

If you don't receive any error message, it means pip is installed correctly and you can use pip to install any package of your choice from the PyPI by using:

>> pip install <package-name>

For Anaconda users, the pip command file should be stored at the following path:

C:\Users\<User Name>\Anaconda3\Scripts\pip.exe.

If pip is not available on your system, you can install pip by following the instructions given at: https://pip.pypa.io/en/latest/installing.

Upgrading and uninstalling the package with pip

Whereas Anaconda Cloud automatically displays a version number of a certain installed package, users choosing to use a default Python installation can use pip to display it through the following command:

>> import pandas
>> pandas.__version__ # output will be a version number, for example: u'0.18.1'

Upgrading a package, for example when there's a new version you'd like to use, can be done as follows:

>> pip install -U pandas==0.21.0

Upgrading it to the latest available version can be done as follows:

>> pip install -U pandas

Uninstalling a package can be done with the following command:

>> pip uninstall <package name>
 

Python virtual environments


The recommended approach to using Python, in general, is a project-based one. This means that each project uses a separate Python version, along with the packages required and their mutual dependencies. This approach gives you the flexibility to switch between different Python versions and installed package versions. Not following this approach would mean that, every time you update a package or install a new one, its dependencies will be updated too, resulting in a different setup. This may cause problems, for example, code that won't run correctly because of changes under the hood, or packages that do not communicate correctly with each other. While this book focuses on Python 3, there won't be any need to switch to a different Python version, but maybe you can imagine using different versions of the same packages for different projects.

Before Anaconda, this project-based approach would require using virtualenv, a tool for creating isolated Python environments. This approach has gotten a lot easier with Anaconda, which offers the same approach but in a more simplified way. Both options are covered in detail as we proceed further.

Virtual environments using Anaconda

As stated before, Anaconda Navigator has a tab called Environments, that when clicked will display an overview of all local environments created by the user on a local file system. You can easily create, import, clone, or remove environments, specify the preferred Python version, and install packages by version number inside such an environment. Any new environment will automatically install a number of Python packages, such as pip. From there, you are free to install more packages. These environments are the exact same virtual environments that you would create by using the virtualenv tool. You can start working with them by opening a terminal or by running Python, which opens a terminal and runs python.exe. 

Anaconda stores all environments in a separate root folder, keeping all your virtual environments in one place. Note that each environment in Anaconda Navigator is treated as a virtual environment, even the root environment.

Managing environments with conda 

Both Anaconda and Miniconda offer the conda package manager, which can also be used to manage virtual environments. Open a terminal and use the following command to list all available environments on your system:

>> conda info -e

Use the following command for creating a virtual environment based on Python version 2.7:

>> conda create -n python3packt python=2.7

Activate the environment next as follows:

>> activate python3packt

Multiple additional packages can now be installed with a single command:

>> conda install -n python3packt <package-name1> <package-name2>

This command calls conda directly. 

Deactivate the environment you've been working in as follows:

>> deactivate

More on managing environments with conda can be found at: https://conda.io/docs/user-guide/tasks/manage-environments.html

Virtual environments using virtualenv

If you don't want to use Anaconda, virtualenv needs to be installed first. Use the following command to install it locally:

>> pip install virtualenv

Next, a virtual environment can be created by assigning with the virtualenv command followed by the name of the new environment, for example:

>> virtualenv python3packt

Navigate to the directory with the same name:

>> cd python3packt

 Next, activate the virtual environment with the activate command:

>> activate

Your virtual environment is now ready for use. Use pip install to install packages exclusively to this environment and use them in your code. Use the deactivate command to stop the virtual environment from working:

>> deactivate

If you have multiple Python versions installed, use the argument -p together with the desired Python version or path to the python.exe file of your choice, for example:

>> -p python2.7

You can also do it as follows:

>> -p c:\python34\python.exe

This step follows creation of the virtual environment and precedes installation of the required packages. For more information on virtualenv, see: http://virtualenv.readthedocs.io/en/stable

 

Summary


This introductory chapter discussed how to install and manage the code libraries that will be used in this book. We'll be working mainly with Anaconda, a freemium open source distribution of the Python programming language that aims to simplify package management and deployment. We discussed how to install Anaconda, and the options for Python package management using Anaconda Navigator, Anaconda Cloud, conda, and pip. Finally, we discussed virtual environments and how to manage these using Anaconda, conda, and virtualenv.

The recommended installation for this book is the Anaconda3 version, that will install not only a working Python environment, but also a large repository of local Python packages, the Jupyter Notebook application, as well as the conda package manager, Anaconda Navigator, and Cloud. In the next chapter, we will introduce the major code libraries used to process and analyze geospatial data.

About the Authors

  • Paul Crickard

    Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.

    Browse publications by this author
  • Eric van Rees

    Eric van Rees was first introduced to Geographical Information Systems (GIS) when studying Human Geography in the Netherlands. For 9 years, he was the editor-in-chief of GeoInformatics, an international GIS, surveying, and mapping publication and a contributing editor of GIS Magazine. During that tenure, he visited many geospatial user conferences, trade fairs, and industry meetings. He focuses on producing technical content, such as software tutorials, tech blogs, and innovative new use cases in the mapping industry.

    Browse publications by this author
  • Silas Toms

    Silas Toms is a geographer and geospatial developer from California. Over the last decade, Silas has become an expert in the use of Python programming for geospatial analysis, publishing two books on the use of ArcPy. Now, as a President of Loki Intelligent Corporation, Silas develops ETL automation tools, interactive web maps, enterprise GIS, and location data for businesses and governments. Silas teaches classes on programming for GIS with BayGeo, and co-hosts The Mappyist Hour podcast.

    Browse publications by this author

Latest Reviews

(5 reviews total)
Great Python resource.
Excelent site, desfasurarea achizitiei este ok!
Great price, got my e-book right away.

Recommended For You

Book Title
Unlock this full book FREE 10 day trial
Start Free Trial