Home Data Python Data Analysis - Second Edition

Python Data Analysis - Second Edition

By Armando Fandango , Ivan Idris
books-svg-icon Book
Subscription FREE
eBook + Subscription $15.99
eBook $43.99
Print + eBook $54.99
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription $15.99
eBook $43.99
Print + eBook $54.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Getting Started with Python Libraries
About this book
Data analysis techniques generate useful insights from small and large volumes of data. Python, with its strong set of libraries, has become a popular platform to conduct various data analysis and predictive modeling tasks. With this book, you will learn how to process and manipulate data with Python for complex analysis and modeling. We learn data manipulations such as aggregating, concatenating, appending, cleaning, and handling missing values, with NumPy and Pandas. The book covers how to store and retrieve data from various data sources such as SQL and NoSQL, CSV fies, and HDF5. We learn how to visualize data using visualization libraries, along with advanced topics such as signal processing, time series, textual data analysis, machine learning, and social media analysis. The book covers a plethora of Python modules, such as matplotlib, statsmodels, scikit-learn, and NLTK. It also covers using Python with external environments such as R, Fortran, C/C++, and Boost libraries.
Publication date:
March 2017
Publisher
Packt
Pages
330
ISBN
9781787127487

 

Chapter 1. Getting Started with Python Libraries

Welcome! Let's get started. Python has become one of the de facto standard language and platform for data analysis and data science. The mind map that you will see shortly depicts some of the numerous libraries available in the Python ecosystem that are used by data analysts and data scientists. NumPy, SciPy, Pandas, and Matplotlib libraries lay the foundation of Python data analysis and are now part of SciPy Stack 1.0 (http://www.scipy.org/stackspec.html). We will learn how to install SciPy Stack 1.0 and Jupyter Notebook, and write some simple data analysis code as a warm-up exercise. 

The following are the libraries available in the Python ecosystem that are used by data analysts and data scientists:

  • NumPy: This is a general-purpose library that provides numerical arrays, and functions to manipulate the arrays efficiently.

  • SciPy: This is a scientific computing library that provides science and engineering related functions. SciPy supplements and slightly overlaps NumPy. NumPy and SciPy historically shared their code base but were later separated.

  • Pandas: This is a data-manipulation library that provides data structures and operations for manipulating tables and time series data.

  • Matplotlib: This is a 2D plotting library that provides support for producing plots, graphs, and figures. Matplotlib is used by SciPy and supports NumPy.

  • IPython: This provides a powerful interactive shell for Python, kernel for Jupyter, and support for interactive data visualization. We will cover the IPython shell later in this chapter.

  • Jupyter Notebook: This provides a web-based interactive shell for creating and sharing documents with live code and visualizations. Jupyter Notebook supports multiple versions of Python through the kernel provided by IPython. We will cover the Jupyter Notebook later in this chapter.

Installation instructions for the other required software will be given throughout the book at the appropriate time. At the end of this chapter, you will find pointers on how to find additional information online if you get stuck or are uncertain about the best way of solving problems:

In this chapter, we will cover the following topics:

  • Installing Python 3

  • Using IPython as a shell

  • Reading manual pages

  • Jupyter Notebook

  • NumPy arrays

  • A simple application

  • Where to find help and references

  • Listing modules inside the Python libraries

  • Visualizing data using matplotlib

 

Installing Python 3


The software used in this book is based on Python 3, so you need to have Python 3 installed. On some operating systems, Python 3 is already installed. There are many implementations of Python, including commercial implementations and distributions. In this book, we will focus on the standard Python implementation, which is guaranteed to be compatible with NumPy.

Note

You can download Python 3.5.x from https://www.python.org/downloads/. On this web page, you can find installers for Windows and Mac OS X, as well as source archives for Linux, Unix, and Mac OS X. You can find instructions for installing and using Python for various operating systems at https://docs.python.org/3/using/index.html.

The software we will install in this chapter has binary installers for Windows, various Linux distributions, and Mac OS X. There are also source distributions, if you prefer. You need to have Python 3.5.x or above installed on your system. The sunset date for Python 2.7 was moved from 2015 to 2020, thus Python 2.7 will be supported and maintained until 2020. For these reasons, we have updated this book for Python 3.

Installing data analysis libraries

We will learn how to install and set up NumPy, SciPy, Pandas, Matplotlib, IPython, and Jupyter Notebook on Windows, Linux, and Mac OS X. Let's look at the process in detail. We shall use pip3 to install the libraries. From version 3.4 onwards, pip3 has been included by default with the Python installation.

On Linux or Mac OS X

To install the foundational libraries, run the following command line instruction:

$ pip3 install numpy scipy pandas matplotlib jupyter notebook 

It may be necessary to prepend sudo to this command if your current user doesn't have sufficient rights on your system.

On Windows

At the time of writing this book, we had the following software installed as a prerequisite on our Windows 10 virtual machine:

Download and install the appropriate prebuilt NumPy and Scipy binaries for your Windows platform from http://www.lfd.uci.edu/~gohlke/pythonlibs/:

  • We downloaded numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl and scipy-0.18.1-cp36-cp36m-win_amd64.whl

  • After downloading, we executed the pip3 install Downloads\numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl and pip3 install Downloads\scipy-0.18.1-cp36-cp36m-win_amd64.whl commands

After these prerequisites are installed, to install the rest of the foundational libraries, run the following command line instruction:

$ pip3 install pandas matplotlib jupyter

Tip

Installing Jupyter using these commands, installs all the required packages, such as Notebook and IPython.

 

Using IPython as a shell


Data analysts, data scientists, and engineers are used to experimenting. IPython was created by scientists with experimentation in mind. The interactive environment that IPython provides is comparable to an interactive computing environment provided by Matlab, Mathematica, and Maple.

The following is a list of features of the IPython shell:

  • Tab completion, which helps you find a command

  • History mechanism

  • Inline editing

  • Ability to call external Python scripts with %run

  • Access to system commands

  • Access to the Python debugger and profiler

The following list describes how to use the IPython shell:

  • Starting a session: To start a session with IPython,enter the following instruction on the command line:

    $ ipython3
    Python 3.5.2 (default, Sep 28 2016, 18:08:09) 
    Type "copyright", "credits" or "license" for more information.
            IPython 5.1.0 -- An enhanced Interactive Python.
    ?         -> Introduction and overview of IPython's features.
    %quickref -> Quick reference.
    help      -> Python's own help system.
    object?   -> Details about 'object', use 'object??' for extra 
                         details.
    In [1]: quit()
    

    Tip

    The quit() function or Ctrl + D quits the IPython shell.

  • Saving a session: We might want to be able to go back to our experiments. In IPython, it is easy to save a session for later use with the following command:

    In [1]: %logstart
    Activating auto-logging. Current session state plus future 
             input saved:
             Filename : ipython_log.py
             Mode : rotate
             Output logging : False
             Raw input log : False
             Timestamping : False
    State : active
    

    Logging can be switched off as follows:

    In [9]: %logoff
    Switching logging OFF
    
  • Executing a system shell command: Execute a system shell command in the default IPython profile by prefixing the command with the ! symbol. For instance, the following input will get the current date:

    In [1]: !date
    

    In fact, any line prefixed with ! is sent to the system shell. We can also store the command output, as shown here:

    In [2]: thedate = !date
    In [3]: thedate
    
  • Displaying history: We can show the history of our commands with the %hist command. For example:

    In [1]: a = 2 + 2
    In [2]: a
    Out[2]: 4
    In [3]: %hist
    a = 2 + 2
    a
    %hist
    

    This is a common feature in command line interface (CLI) environments. We can also search through the history with the -g switch as follows:

    In [5]: %hist -g a = 2
          1: a = 2 + 2
    

We saw a number of so-called magic functions in action. These functions start with the % character. If the magic function is used on a line by itself, the % prefix is optional.

               
About the Authors
  • Armando Fandango

    Dr. Armando creates AI-empowered products by leveraging reinforcement learning, deep learning, and distributed computing. Armando has provided thought leadership in diverse roles at small and large enterprises, including Accenture, Nike, Sonobi, and IBM, along with advising high-tech AI-based start-ups. Armando has authored several books, including Mastering TensorFlow, TensorFlow Machine Learning Projects, and Python Data Analysis, and has published research in international journals and presented his research at conferences. Dr. Armando’s current research and product development interests lie in the areas of reinforcement learning, deep learning, edge AI, and AI in simulated and real environments (VR/XR/AR).

    Browse publications by this author
  • Ivan Idris

    Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.

    Browse publications by this author
Latest Reviews (4 reviews total)
Nice. I’m reading this book for my job. It helps me a lot. Thanks.
Clear and concise - only just started it though.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Python Data Analysis - Second Edition
Unlock this book and the full library FREE for 7 days
Start now