This book focuses on important code libraries for geospatial data management and analysis for Python 3. The reason for this is simple—as Python 2 is near the end of its life cycle, it is quickly being replaced by Python 3. This new Python version comes with key differences in organization and syntax, meaning that developers need to adjust their legacy code and apply new syntax in their code. Fields such as machine learning, data science, and big data have changed the way geospatial data is managed, analyzed, and presented today. In all these areas, Python 3 has quickly become the new standard, which is another reason for the geospatial community to start using Python 3.
The geospatial community has been relying on Python 2 for a long time, as many dependencies weren't available for Python 3 or not working correctly. But now that Python 3 is mature and stable, the geospatial community has taken advantage of its capabilities, resulting in many new libraries and tools. This book aims to help developers understand open source and commercial modules for geospatial programs written in Python 3, offering a selection of major geospatial libraries and tools for doing geospatial data management and data analysis.
This chapter will explain how to install and manage the code libraries that will be used in this book. It will cover the following topics:
- Installing Anaconda
- Managing Python packages using Anaconda Navigator, Anaconda Cloud,
- Managing virtual environments using Anaconda,
- Running a Jupyter Notebook
Anaconda is a freemium open source distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. It is also the world's most popular Python data science platform, with over 4.5 million users and 1,000 data science packages. It is not to be confused with
conda, a package manager that is installed with Anaconda.
For this book, we recommend installing and using Anaconda as it provides you everything you need—Python itself, Python libraries, the tools to manage these libraries, a Python environment manager, and the Jupyter Notebook application to write, edit, and run your code. You can also choose to use an alternative to Anaconda or install Python through www.python.org/downloads and use any IDE of your choice combined with a package manager such as
pip (covered as we proceed further). We recommend using Python version 3.6.
A free download of the latest version of Anaconda, available for Windows, macOS, and Linux is available at the homepage of Continuum Analytics. At the time of writing, the latest version is Anaconda 5.0.1, released in October 2017 and available in 32 and 64-bit versions from https://www.continuum.io/downloads. This page also offers extensive download instructions for each operating system, a 30-minute tutorial that explains how to use Anaconda, a cheat sheet on how to get started, and an FAQ section. There's also a slimmed-down version of Anaconda called Miniconda that only installs Python and the
conda package manager, leaving out the 1000+ software packages that come with the standard installation of Anaconda: https://conda.io/miniconda.html. If you decide to use this, make sure you download the Python 3.6 version.
Anaconda will install Python 3.6.2 as the default Python version on your machine. The Python version that is used in all chapters of this book is Python 3.6, so you're good with any version that starts with 3.6 or higher. With Anaconda, you get more than 1,000 Python packages, as well as a number of applications, such as Jupyter Notebook, and a variety of Python consoles and IDEs.
Please note that you are not forced to always use Python version 3.6 after installing it—using Anaconda Navigator (a GUI for managing local environments and installing packages), you can also choose to use Python 3.5 or 2.7 in a virtual environment. This gives you more flexibility in switching between different Python versions for various projects.
To begin the installation, download the 32-or 64-bit Anaconda installer, depending on your system capabilities. Open the installation and follow the setup guide to install Anaconda on your local system.
Jupyter Notebooks are a novel idea, which has been adopted by many companies (including Esri and the new ArcGIS API for Python). Managed by Project Jupyter, the open source project (which is based on IPython, an earlier interactive code environment), is a fantastic tool for both learning and production environments. While the code can also be run as a script, as seen in other chapters, using the Jupyter Notebooks will make coding even more fun.
The idea of the code Notebooks is to make coding interactive. By combining a Python terminal with direct output that results from the code being run, the Notebooks (which are saveable) become a tool for sharing and comparing code. Each section can be edited later or can be saved as a separate component for demonstration purposes.
Check out the documentation for Jupyter Notebooks here:http://jupyter.org/documentation.
C:\PythonGeospatial3>cartoenv\Scripts\activate (cartoenv) C:\PythonGeospatial3>jupyter notebook [I 17:30:46.338 NotebookApp] Serving notebooks from local directory: C:\PythonGeospatial3 [I 17:30:46.338 NotebookApp] 0 active kernels [I 17:30:46.339 NotebookApp] The Jupyter Notebook is running at: [I 17:30:46.339 NotebookApp] http://localhost:8888/?token=5376ed8c704d0ead295a3c0464e52664e367094a9e74f70e [I 17:30:46.339 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 17:30:46.344 NotebookApp] Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=5376ed8c704d0ead295a3c0464e52664e367094a9e74f70e [I 17:30:46.450 NotebookApp] Accepting one-time-token-authenticated connection from ::1 [I 17:30:49.490 NotebookApp] Kernel started: 802159ef-3215-4b23-b77f-4715e574f09b [I 17:30:50.532 NotebookApp] Adapting to protocol v5.1 for kernel 802159ef-3215-4b23-b77f-4715e574f09b
This will start running the server that will power the Notebooks. This local server can be accessed on port
8888, using a browser, by navigating to:
http://localhost:8888. It should automatically open a tab like this one when started:
If you log out, use the token provided in the text generated when the
jupyter notebook command is passed to log back in, as in this example:
In Jupyter Notebooks, code is added in the
In sections. The code can be added line by line, as the code variables and imported modules will be saved in memory, or it can be added in blocks/multiple lines, like a script. The
In sections can be edited and run over and over, or they can be left alone, and a new section can be started. This creates a record of the scripting efforts, along with the interactive output.
Here is a GIST explaining lots of useful keyboard shortcuts for Jupyter Notebooks:https://gist.github.com/kidpixo/f4318f8c8143adee5b40
After installing Anaconda, you will notice a working folder with various applications inside of it. One of these is Anaconda Navigator, which provides a Graphical User Interface (GUI). You can compare it to Windows File Explorer, that is, an environment to manage projects, packages, and environments. The term environment refers to a collection of packages and a Python install. Notice that this is similar to how you would use
virtualenv, but this time using a GUI instead of a command prompt to create one (
virtualenv is covered in more detail later in this chapter).
After opening Anaconda Navigator, click the
Environments tab on the left of the screen and Anaconda Navigator will provide an overview of existing environments and the packages it contains. There's one pre-defined environment available, a so-called
root environment that provides you with 150+ pre-installed Python packages. New environments can be made by clicking the
Create button on the bottom of the screen. This will automatically install five default Python packages, including
pip, which means you're free to use that too for package management. What's interesting about Anaconda Navigator is that, with every new environment, you can choose a preferred Python version and install from a list of 1000+ packages that are available locally if you installed the default Anaconda version and not Miniconda. This list is available by selecting the option
Not Installed from the drop-down menu next to the
Channels button. You can easily search and select the packages of your choice by using the
Search Packages field and hitting Enter. Mark the packages and install them for the environment of your choice. After installation, the package will be listed by name in the environment. If you click the green box with a checkmark next to the package name, you can choose to mark a package for an upgrade, removal, or specific version installation.
After installing the packages, you can start working with an environment by opening up a terminal, Jupyter Notebook, or another Anaconda application with one mouse click on the arrow button inside of the environment of your choice. If you wish to use an IDE instead of one of the options that Anaconda Navigator offers you, be sure to redirect your IDE to the right
python.exe file that is used by Anaconda. This file can usually be found at the following path, which is the default installation path of Anaconda:
If you are searching for a Python package that is not found in the local list of available Python packages, you can use Anaconda Cloud. This application is also part of Anaconda3 and you can use the Anaconda Cloud application for sharing packages, Notebooks, and environments with others. After clicking on the Anaconda Cloud desktop icon, an internet page will open where you can sign up to become a registered user. Anaconda Cloud is similar to GitHub, as it lets you create a private online repository for your own work. These repositories are called channels.
If you create a user account, you can use Anaconda Cloud from inside Anaconda Navigator. After creating a user account for Anaconda Cloud, open Anaconda Navigator and use your login details to sign into Anaconda Cloud in the upper-right corner of the screen where it says
Sign in to Anaconda Cloud. Now, you can upload your own packages and files to a private package repository and search for existing files or packages.
Apart from using Anaconda Navigator and Cloud for package management, you can use
conda, a binary package manager, as a command-line tool to manage your package installations.
conda quickly installs, runs, and updates packages and their dependencies.
conda easily creates, saves, loads, and switches between environments on your local computer. The best ways to install
conda are through installing either Anaconda or Miniconda. A third option is a separate installation through Python Package Index (PyPI), but may not be up-to-date so this option is not recommended.
Installing packages with
conda is straightforward, as it resembles the syntax of
pip. However, it is good to know that
conda cannot install packages directly from a Git server. This means that the latest version of many packages under development cannot be downloaded with
conda doesn't cover all the packages available on PyPI as
pip does itself, which is why you always have access to
pip when creating a new environment with Anaconda Navigator (more on
pip as we proceed further).
You can verify if
conda is installed by typing the following command in a terminal:
>> conda -version
conda will display the number of the
version that you have installed. Installing the package of your choice can be done with the following command in a terminal:
>> conda install <package-name>
>> conda update <package-name>
You can also install a particular version of a package by pointing out the version number:
>> conda install <package-name>=1.2.0
You can update all the available packages simply by using the
>> conda update --all
You can uninstall packages too:
>> conda remove <package-name>
conda documentation is available at: https://conda.io/docs/index.html.
As stated earlier, Anaconda users always have
pip available in every new environment, as well as the
root folder—it comes pre-installed with every version of Anaconda, including Miniconda. As
pip is a Python package manager used to install and manage software packages written in Python, it runs in the command line, as opposed to Anaconda Navigator and Cloud. If you decide not to use Anaconda or anything similar to it, and use a default Python installation from python.org, you can either use
pip as a package manager. As
pip is seen as an improvement over
easy_install and the preferred Python package manager for Python 3, we will only discuss
pip here. It is recommended to use either
conda, Anaconda Navigator, or Cloud for Python package management in the upcoming chapters.
Optionally, as you install Anaconda, three environment variables will be added to your list of user variables. This enables you to access commands such as
pip from any system location if you open a terminal. To check if
pip is installed on your system, open a terminal and enter:
>> pip install <package-name>
For Anaconda users, the
pip command file should be stored at the following path:
pip is not available on your system, you can install
pip by following the instructions given at: https://pip.pypa.io/en/latest/installing.
Whereas Anaconda Cloud automatically displays a version number of a certain installed package, users choosing to use a default Python installation can use
pip to display it through the following command:
>> import pandas >> pandas.__version__ # output will be a version number, for example: u'0.18.1'
Upgrading a package, for example when there's a new version you'd like to use, can be done as follows:
>> pip install -U pandas==0.21.0
Upgrading it to the latest available version can be done as follows:
>> pip install -U pandas
Uninstalling a package can be done with the following command:
>> pip uninstall <package name>
The recommended approach to using Python, in general, is a project-based one. This means that each project uses a separate Python version, along with the packages required and their mutual dependencies. This approach gives you the flexibility to switch between different Python versions and installed package versions. Not following this approach would mean that, every time you update a package or install a new one, its dependencies will be updated too, resulting in a different setup. This may cause problems, for example, code that won't run correctly because of changes under the hood, or packages that do not communicate correctly with each other. While this book focuses on Python 3, there won't be any need to switch to a different Python version, but maybe you can imagine using different versions of the same packages for different projects.
Before Anaconda, this project-based approach would require using
virtualenv, a tool for creating isolated Python environments. This approach has gotten a lot easier with Anaconda, which offers the same approach but in a more simplified way. Both options are covered in detail as we proceed further.
As stated before, Anaconda Navigator has a tab called
Environments, that when clicked will display an overview of all local environments created by the user on a local file system. You can easily create, import, clone, or remove environments, specify the preferred Python version, and install packages by version number inside such an environment. Any new environment will automatically install a number of Python packages, such as
pip. From there, you are free to install more packages. These environments are the exact same virtual environments that you would create by using the
virtualenv tool. You can start working with them by opening a terminal or by running Python, which opens a terminal and runs
Anaconda stores all environments in a separate
root folder, keeping all your virtual environments in one place. Note that each environment in Anaconda Navigator is treated as a virtual environment, even the root environment.
Both Anaconda and Miniconda offer the
conda package manager, which can also be used to manage virtual environments. Open a terminal and use the following command to list all available environments on your system:
>> conda info -e
Use the following command for creating a virtual environment based on Python version 2.7:
>> conda create -n python3packt python=2.7
Activate the environment next as follows:
>> activate python3packt
Multiple additional packages can now be installed with a single command:
>> conda install -n python3packt <package-name1> <package-name2>
This command calls
Deactivate the environment you've been working in as follows:
More on managing environments with
conda can be found at: https://conda.io/docs/user-guide/tasks/manage-environments.html
>> pip install virtualenv
Next, a virtual environment can be created by assigning with the
virtualenv command followed by the name of the new environment, for example:
>> virtualenv python3packt
Navigate to the directory with the same name:
>> cd python3packt
Next, activate the virtual environment with the
Your virtual environment is now ready for use. Use
pip install to install packages exclusively to this environment and use them in your code. Use the
deactivate command to stop the virtual environment from working:
If you have multiple Python versions installed, use the argument
-p together with the desired Python version or path to the
python.exe file of your choice, for example:
>> -p python2.7
You can also do it as follows:
>> -p c:\python34\python.exe
This step follows creation of the virtual environment and precedes installation of the required packages. For more information on
virtualenv, see: http://virtualenv.readthedocs.io/en/stable
This introductory chapter discussed how to install and manage the code libraries that will be used in this book. We'll be working mainly with Anaconda, a freemium open source distribution of the Python programming language that aims to simplify package management and deployment. We discussed how to install Anaconda, and the options for Python package management using Anaconda Navigator, Anaconda Cloud,
pip. Finally, we discussed virtual environments and how to manage these using Anaconda,
The recommended installation for this book is the Anaconda3 version, that will install not only a working Python environment, but also a large repository of local Python packages, the Jupyter Notebook application, as well as the
conda package manager, Anaconda Navigator, and Cloud. In the next chapter, we will introduce the major code libraries used to process and analyze geospatial data.