Reader small image

You're reading from  Applied Data Science with Python and Jupyter

Product typeBook
Published inOct 2018
Reading LevelBeginner
Publisher
ISBN-139781789958171
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Alex Galea
Alex Galea
author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea

Right arrow

Preface

Note

About

This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

About the Book

Applied Data Science with Python and Jupyter teaches you the skills you need for entry-level data science. You'll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world. You'll finish up by learning how easy it can be to scrape and gather your own data from the open web so that you can apply your new skills in an actionable context.

About the Author

Alex Galea has been doing data analysis professionally since graduating with a master's in physics from the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. More recently, Alex has been doing web data analytics, where Python continues to play a large part in his work. He frequently blogs about work and personal projects, which are generally data-centric and usually involve Python and Jupyter Notebooks.

Objectives

  • Get up and running with the Jupyter ecosystem

  • Identify potential areas of investigation and perform exploratory data analysis

  • Plan a machine learning classification strategy and train classification models

  • Use validation curves and dimensionality reduction to tune and enhance your models

  • Scrape tabular data from web pages and transform it into Pandas DataFrames

  • Create interactive, web-friendly visualizations to clearly communicate your findings

Audience

Applied Data Science with Python and Jupyter is ideal for professionals with a variety of job descriptions across a large range of industries, given the rising popularity and accessibility of data science. You'll need some prior experience with Python, with any prior work with libraries such as Pandas, Matplotlib, and Pandas providing you a useful head start.

Approach

Applied Data Science with Python and Jupyter covers every aspect of the standard data workflow process with a perfect blend of theory, practical hands-on coding, and relatable illustrations. Each module is designed to build on the learnings of the previous chapter. The book contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.

Minimum Hardware Requirements

The minimum hardware requirements are as follows:

  • Processor: Intel i5 (or equivalent)

  • Memory: 8 GB RAM

  • Hard disk: 10 GB

  • An internet connection

Software Requirements

You'll also need the following software installed in advance:

  • Python 3.5+

  • Anaconda 4.3+

  • Python libraries included with Anaconda installation:

  • matplotlib 2.1.0+

  • ipython 6.1.0+

  • requests 2.18.4+

  • beautifulsoup4 4.6.0+

  • numpy 1.13.1+

  • pandas 0.20.3+

  • scikit-learn 0.19.0+

  • seaborn 0.8.0+

  • bokeh 0.12.10+

  • Python libraries that require manual installation:

  • mlxtend

  • version_information

  • ipython-sql

  • pdir2

  • graphviz

Installation and Setup

Before you start with this book, we'll install Anaconda environment which consists of Python and Jupyter Notebook.

Installing Anaconda

  1. Visit https://www.anaconda.com/download/ in your browser.

  2. Click on Windows, Mac, or Linux, depending on the OS you are working on.

  3. Next, click on the Download option. Make sure you download the latest version.

  4. Open the installer after download.

  5. Follow the steps in the installer and that's it! Your Anaconda distribution is ready.

Updating Jupyter and Installing Dependencies

  1. Search for Anaconda Prompt and open it.

  2. Type the following commands to update conda and Jupyter:

    #Update conda
    conda update conda
    
    #Update Jupyter
    conda update Jupyter
    
    #install packages
    conda install numpy
    conda install pandas
    conda install statsmodels
    conda install matplotlib
    conda install seaborn
  3. To open Jupyter Notebook from Anaconda Prompt, use the following command:

    jupyter notebook
    pip install -U scikit-learn

Additional Resources

The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Applied-Data-Science-with-Python-and-Jupyter.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions

Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:

"The final figure is then saved as a high resolution PNG to the figures folder."

A block of code is set as follows:

y = df['MEDV'].copy()
del df['MEDV']
df = pd.concat((y, df), axis=1)

Any command-line input or output is written as follows:

jupyter notebook

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on New in the upper-right corner and select a kernel from the drop-down menu."

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Applied Data Science with Python and Jupyter
Published in: Oct 2018Publisher: ISBN-13: 9781789958171
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea