Learn Python by Building Data Science Applications

4 (3 reviews total)
By Philipp Kats , David Katz
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Preparing the Workspace

About this book

Python is the most widely used programming language for building data science applications. Complete with step-by-step instructions, this book contains easy-to-follow tutorials to help you learn Python and develop real-world data science projects. The “secret sauce” of the book is its curated list of topics and solutions, put together using a range of real-world projects, covering initial data collection, data analysis, and production.

This Python book starts by taking you through the basics of programming, right from variables and data types to classes and functions. You’ll learn how to write idiomatic code and test and debug it, and discover how you can create packages or use the range of built-in ones. You’ll also be introduced to the extensive ecosystem of Python data science packages, including NumPy, Pandas, scikit-learn, Altair, and Datashader. Furthermore, you’ll be able to perform data analysis, train models, and interpret and communicate the results. Finally, you’ll get to grips with structuring and scheduling scripts using Luigi and sharing your machine learning models with the world as a microservice.

By the end of the book, you’ll have learned not only how to implement Python in data science projects, but also how to maintain and design them to meet high programming standards.

Publication date:
August 2019
Publisher
Packt
Pages
482
ISBN
9781789535365

 

Preparing the Workspace

Welcome! We're very excited to start learning and building things with you! However, we need to get ourselves ready first.

In this chapter, we'll learn how to download and install everything you'll need throughout the book, including Python itself, all the Python packages that we'll need, and two development tools we will be using extensively: Jupyter and Visual Studio Code (VS Code). After that, we'll go through a brief overview of Jupyter and VS Code interfaces. Finally, you will run your very first line of Python, so we need to ensure that everything is ready before we dive in.

In this chapter, we'll cover the following:

  • The minimum computer configuration required
  • How to install the Anaconda distribution
  • How to download the code for this book
  • Setting up and getting familiar with VS Code and Jupyter
  • Running your first line of code to ensure everything runs smoothly

By the end of this chapter, you will have learned about the hardware requirements for Python and this book, and what you can do if you don't have a sufficiently powerful computer. You will also learn how to install Python 3.7.2 and all required packages and tools using the open source Anaconda distribution. 

 

Technical requirements

Python can be very humble and does not require an advanced computer. In fact, you can run Python on a $10 Raspberry Pi or an Arduino board! The code and data we use in this book do not require any special computational power, any laptop, or any computer made after 2008. At least 2 GB of RAM, 20 GB of disk space, and an internet connection should suffice. Your operating system (OS) shouldn't be a problem either, as Python and all the tools we will use are cross-platform and work on Windows, macOS, and Linux. 

Throughout the book, we'll use two main tools to write the code: Jupyter and VS Code. Both of them are free and aren't demanding.

All the code for the book is publicly available and free to access at https://github.com/PacktPublishing/Learn-Python-by-Building-Data-Science-Applications.

 

Installing Python

There are multiple Python distributions, starting with the original, vanilla Python, which is accessible at https://www.python.org/. Data analysis, however, adds unique requirements for packaging (https://www.youtube.com/watch?v=QjXJLVINsSA&feature=youtu.be&t=3555). In this book, we use Anaconda, which is an open source and free Python distribution, designed for data science and machine learning. Anaconda's main features include a smooth installation of data science packages (many of which run C and Fortran languages under the hood) and conda, which is a great package and environment manager (we will talk more about environments and conda later in Chapter 9, Shell, Git, Conda, and More – at Your Command). Conveniently, the Anaconda distribution installs all the packages (https://docs.anaconda.com/anaconda/packages/pkg-docs/) we need in this book and many more!

In order to install Anaconda, follow these steps:

  1. First, go to the Anaconda distribution web page at https://www.anaconda.com/distribution/.
  2. Select the Python 3.7 graphical installer for your platform and download it (at the time of writing, there is no graphical installer for Linux, so you'll have to use the one for the command line). The following screenshot shows what the interface looks like—we've marked the link we're interested in with dotted lines:

  1. Run the installation. Keep all settings as default. When you're asked if you want to install PyCharm, select no (until you personally want to, of course, but we won't use PyCharm in this book):

Voila! Now we have Python up and running! Next, let's download all the materials for this book.

We use Anaconda build 3-2018.12, which is the most recent version at the time of writing this book. Until a new version is released, this build will be accessible at https://repo.anaconda.com/archive/.
 

Downloading materials for running the code

All code in this book is also available as a separate archive of files—either Python scripts or Jupyter notebooks. You can download the full archive and follow along with the book using the relevant code from GitHub (https://github.com/PacktPublishing/Learn-Python-by-Building-Data-Science-Applications). Everything is stored on GitHub, which is an online service for code storage with version control. We will discuss both Git and GitHub in Chapter 9, Shell, Git, Conda, and More – at Your Command, but in this case, you won't need version control, so it is easier to download everything as an archive. Just use the Clone or download button on the right side (1), and select Download ZIP (2):

Once the download is complete, unzip the file and move it to a convenient location. This folder will be our main workspace throughout the book.

Installing Python packages

Many of the chapters in this book teach you how to make use of specific packages. Most of them are included in the standard Anaconda distribution, so if you installed Python using the Anaconda distribution, then you will have them already. Some packages might not be installed though, so we'll have to install them separately as per our requirements for every chapter. This is totally fine, and we'll specify which packages will be used at the beginning of each chapter.

In order to install a specific package, you have two options:

  • Installing via Anaconda by running either of the following commands. Specifying a channel is required if a package is rare and not present on the default channels of Anaconda and conda-forge:
> conda install <mypackage>
> conda install -c <mychannel> <mypackage>

Some packages are not present in conda at all. You can search for packages through the channels at https://anaconda.org/.

  • Most packages can be installed using pip:
> pip install <mypackage>

Generally speaking, we recommend using conda over pip for installation.

Alternatively, there is a single specification in the root of the repository that you can use to install everything at once. To do so, you need to go in your Terminal, and then to the repository's root (we will explain how to do that in Chapter 9, Shell, Git, Conda, and More – at Your Command, but VS Code's Terminal will open in the root of the given folder automatically). Once there, run the following command:

conda env update --name root -f environment.yml

Then, follow the instructions. Here, conda uses the environment.yml specification file as a list of packages to install.

Now, let's install our main development tools: VS Code and Jupyter.

 

Working with VS Code

VS Code is invaluable for Python development and experimentation. VS Code—not to be confused with Visual Studio, which is a commercial product—is a sophisticated, completely free, and open source text editor created by Microsoft. It is language-agnostic and will work perfectly with Python, JavaScript, Java, or any other language. VS Code has hundreds of built-in features and thousands of great plugins to expand its capabilities.

In order to install VS Code, head to its main web page, https://code.visualstudio.com/, and download the package for your OS. The installation is pretty straightforward; there is no need to change any of the default settings. Assuming you installed VS Code as part of the previous steps, you now need to open the VS Code application. Next, switch to the plugin marketplace menu (as shown in the following screenshot), type Python, and install the plugin. Python binding for VS Code provides plenty of Python-specific features and will prove very useful for us throughout the book.

In the following screenshot, 1 represents the plugin marketplace. Once switched, type Python in the search form (2), select the plugin (3), and hit install (Python was already installed in this screenshot, hence it offers to uninstall it instead):

Once that's done, let's briefly review the interface of the tool.

The VS Code interface

Let's go over the VS Code interface. In the following screenshot, you can see five distinctive sections:

Section 1 of VS Code has six icons (more will appear after installing certain plugins). The last one at the bottom of the toolbar, which is a gear symbol, represents the settings. All the others represent different modes, from top to bottom:

  1. Explorer mode, which allows us to look for the files that are open in the given workspace
  2. Search mode, which allows us to look for a particular text element throughout the whole workplace
  1. A built-in Git client (more on that in Chapter 3, Functions)
  2. Debugger mode, which halts and inspects code in the middle of the execution in order to understand what's happening under the hood
  3. VS Code's plugin marketplace

Every mode changes the content of section 2. This is as an area that is dedicated to working with the workspace as a whole, which includes adding new files, removing existing ones, working with the workspace, or traversing through variables in debugging sessions.

Section 3 is the main one. Here, we actually write and read the code. You can have multiple tabs or even split this window into many: vertically, horizontally, or both. Most of the time, each tab represents one file in the workspace.

If you don't have section 4 open, then go to View | Terminal or use the Ctrl + ` shortcut. You can also drag this section out from the upper edge of section 5 using your mouse, if you prefer.

Section 5 has four subsections. In PROBLEMS, VS Code will point you to some potential issues in the code. The OUTPUT and DEBUG CONSOLE tabs' roles are self-explanatory, and we won't use them much. The most important tab here is Terminal: it duplicates the Terminal built into your OS (hence, it does not directly relate to VS Code itself). Terminals allow us to run system-wide commands, create folders, write to files, execute Python scripts, and run any software, which is essentially everything you can do via your OS graphical interface, but done just using code. We will cover the Terminal in more depth in Chapter 9, Shell, Git, Conda, and More – at Your Command. Conveniently, VS Code's Terminals open in the root directory of the workspace, which is a feature we will constantly utilize throughout the book.

Lastly, section 5 is an information bar that shows the current properties of the workspace, including the interpreter's name, Git repository and branch names (more on that in Chapter 3, Functions), and cursor position. Most of those elements are interactive!

One more feature that is hidden from the newcomers, but is an extremely powerful feature of VS Code, is its command palette, as shown in the following screenshot:

You can open the command palette using the Ctrl (command on macOS) + Shift + P shortcut. The command palette allows you to type in, select, and execute practically any feature of the application, from switching the color theme to searching for a word, to almost anything else. This feature allows programmers to avoid using a mouse or trackpad, and once mastered, it drastically increases productivity.

For example, let's create a new file (Ctrl/command + N) and type Hello Python!. Now, in order to switch that text to uppercase, all we need is to do the following:

  1. Select all of the text by using Ctrl/command + A.
  2. Open the command palette (Ctrl/command + Shift + P) and type Upper. Select the Transform to Uppercase command (note that the command palette also shows shortcuts).

Spend some time learning VS Code's features! One great place for that is the Interactive Playground: you can jump straight into it by typing the name into the command palette.

Another great feature of VS Code is that it can use the key bindings that you use in other editors, including Vim, Sublime, and Atom. If you're used to their bindings, then switch to them, as they will save you a lot of time and frustration.
 

Beginning with Jupyter

Another development environment we'll use is Jupyter. If you have installed Anaconda, then Jupyter is already on your machine, as it is one of the tools that come with Anaconda. To start using Jupyter, we need to run it from the Terminal (you might need to open a new Terminal to update the paths). The following code will run a newer version of the tool's frontend face, and that is what we'll use:

$ jupyter lab

Alternatively, it also supports an older version of the frontend via Jupyter Notebook. The two have their differences, but we'll stick with the lab.

The app's behavior depends on the folder from which it was started; it is more convenient to run it directly from the project's root folder. That's why it is so handy that VS Code's Terminal opens in a workspace folder by itself, as we don't need to navigate there every time. But why do we need another developer tool, anyway? That's what the next section is all about.

Notebooks

As we mentioned earlier, Jupyter is designed with a different approach to programming than VS Code. Its central concept is so-called notebooks: files that allow the mixing of actual code, text (including markdown and LaTeX equations), as well as plots, images, videos, and interactive visualizations. In notebooks, you execute code interactively, one cell after another. This way, you can experiment easily—write some code, run it, see the outcomes, and then tweak it again.

The outcomes are shown along with the code so that you can open and read the notebook, even without executing it. Because of that, notebooks are especially useful in scientific/analytical contexts, as on the one hand, they allow us to describe what we're doing with text and illustrations, and on the other hand, they keep the actual code tied to the narrative so that anyone can inspect and confirm that your analysis is valid. One great example of that is LIGO notebooks, which represent the actual code that was used to discover gravitational waves in the universe (this research won the Nobel Prize in 2017).

Notebooks are also great for teaching (as in the case of this book), as students can interact with each and every part of the code by themselves. However, while Jupyter is good for exploration, it feels less convenient when your code base starts to grow and mature. Because of this, we will switch back and forth between Jupyter and VS Code throughout the course of this book, picking the right tool for each particular job.

Let's now look at Jupyter's interface.

The Jupyter interface

Let's get familiar with Jupyter's interface. This software works differently to VS Code: Jupyter works as a web server that is accessible through a browser. To make it run, just type jupyter lab in VS Code's Terminal window and hit Enter. This will start the server. Depending on your OS, either a link will be printed in the Terminal (starting with localhost://...), or your default web browser will just open the page automatically. You can stop the Jupyter server by hitting Ctrl + C within the Terminal and typing yes, if prompted, or by closing the window.

Jupyter's layout, as shown in the following screenshot, is somewhat similar to that of VS Code:

Here, again, the tabs in section 1 show all the modes available for section 2, including a file browser, a list of running notebooks, a list of available commands, and tabs. The second section represents one of the modes previously described. Finally, the main section, section 3, shows all open tabs, similar to section 3 in VS Code. The default tab is Launcher. From here, we can create new notebooks, text files (such as classic code or data files), Terminals, and consoles.

Note that the launcher explicitly states Python 3 for both notebooks and consoles. This is because Jupyter is also language-agnostic. In fact, the name Jupyter comes from the Julia-Python-R triad of analytical languages, but the application supports many others, including C, Java, and Rust. In this book, we'll only use Python.

If everything went smoothly with Jupyter, then we're ready to go! But before we dive into coding, let's do one last pre-flight check.

 

Pre-flight check

Before we proceed to the content of this book, let's ensure our code can actually be executed by running the simplest possible code in Jupyter. To do this, let's create a test notebook and run some code to ensure everything works as intended. Click on the Python 3 square in the Notebook section. A new tab should open, called Untitled.ipynb

First, the blue line highlighted represents the selected cell in the notebook. Each cell represents a separate snippet of code, which is executed simultaneously in one step. Let's write our very first line of code in this cell:

print('Hello world')

Now, hit Shift + Enter. This shortcut executes the selected cells in Python and outputs the result on the next line. It also automatically creates a new input cell if there are none, as shown in the following screenshot. The number on the left gives a hint as to the order in which cells are executed, so the first cell to be executed will be marked with 1. The asterisk means the cell is under execution and computation is underway:

If everything worked properly, and you see Hello world in the output, then congratulations—you are ready for the following chapters!

Cells can also include markdown, which is useful for including explanations, images, or equations. For that, just switch from Code to Markdown by using the dropdown at the top.
 

Summary

In this chapter, we prepared our working environment for the journey ahead. In particular, we installed the Anaconda Scientific Python Distribution with Python 3.7.2, which includes all the packages we'll need throughout the course of this book. We also installed and learned about the basics of VS Code, which is a sophisticated and interactive development environment that will be our primary tool for writing arbitrary code, and Jupyter, which we use for experimentation and analysis. Finally, we discussed and even ran some code already! We did this in Jupyter, which is a coding environment that is perfect for prototyping, experimentation, analysis, and educational purposes.

In the next chapter, we'll begin our introduction to Python, learning about variables, variable assignment, and Python's basic data types.

 

Questions

  1. What version of Python do we use?
  2. Will it work on a Windows PC?
  3. Do I need to install any additional packages?
  4. What is a Jupyter Notebook?
  5. When and why should I use Jupyter Notebooks?
  6. When should I switch to VS Code?
  7. Can I run the code from this book on my smartphone/tablet?
 

About the Authors

  • Philipp Kats

    Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.

    Browse publications by this author
  • David Katz

    David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.

    Browse publications by this author

Latest Reviews

(3 reviews total)
Great content! I learned a lot through the first chapters. I've done several courses and read books but this book is really thorough.
Really good content, as always.
Se descargo segun fecha. Solo le he dado una revision rapida y parece muy interesante.

Recommended For You

Book Title
Access this book and the full library for just $5/m.
Access now