Packt+ | Advance your knowledge in tech

You're reading from Jupyter for Data Science

Product typeBook

Published inOct 2017

Reading LevelBeginner

PublisherPackt

ISBN-139781785880070

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1)

Dan Toomey

Chapter 2. Working with Analytical Data on Jupyter

Jupyter does none of the heavy lifting for analyzing data: all the work is done by programs written in a selected language. Jupyter provides the framework to run a variety of programming language modules. So, we have a choice how we analyze data in Jupyter.

A popular choice for data analysis programming is Python. Jupyter does have complete support for Python programming. We will look at a variety of programming solutions that might tax such a support system and see how Jupyter fairs.

Data scraping with a Python notebook

A common tool for data analysis is gathering the data from a public source such as a website. Python is adept at scraping websites for data. Here, we look at an example that loads stock price information from Google Finance data.

In particular, given a stock symbol, we want to retrieve the last year of price ranges for that symbol.

One of the pages on the Google Finance site will give the last years' worth of price data for a security company. For example, if we were interested in the price points for Advanced Micro Devices (AMD), we would enter the following URL:

https://www.google.com/finance/historical?q=NASDAQ:AMD

Here, NASDAQ is the stock exchange that carries the AMD security. On the resultant Google page, there is a table of data points of interest, as seen in the following partial screenshot.

Like many sites that you will be attempting to access, there is a lot of other information on the page as well, like headers and footers and ads, as you can see...

Using heavy-duty data processing functions in Jupyter

Python has several groups of processing functions that can tax computer system power. Let us use some of these in Jupyter and determine if the functionality performs as expected.

Using NumPy functions in Jupyter

NumPy is a package in Python providing multidimensional arrays and routines for array processing. We bring in the NumPy package using import * from numpy statement. In particular, the NumPy package defines the array keyword, referencing a NumPy object with extensive functionality.

The NumPy array processing functions run from the mundane, such as min() and max() functions (which provide the minimum and maximum values over the array dimensions provided), to more interesting utility functions for producing histograms and calculating correlations using the elements of a data frame.

With NumPy, you can manipulate arrays in many ways. For example, we will go over some of these functions with the following scripts, where we will use NumPy...

Using SciPy in Jupyter

SciPy is an open source library for mathematics, science and, engineering. With such a wide scope, there are many areas we can explore using SciPy:

Integration
Optimization
Interpolation
Fourier transforms
Linear algebra
There are several other intense sets of functionality as well, such as signal processing

Using SciPy integration in Jupyter

A standard mathematical process is integrating an equation. SciPy accomplishes this using a callback function to iteratively calculate out the integration of your function. For example, suppose that we wanted to determine the integral of the following equation:

We would use a script like the following. We are using the definition of pi from the standard math package.

from scipy.integrate import quadimport mathdef integrand(x, a, b):    return a*math.pi + ba = 2b = 1quad(integrand, 0, 1, args=(a,b))

Again, this coding is very clean and simple, yet almost impossible to do in many languages. Running this script in Jupyter we see the results...

Expanding on panda data frames in Jupyter

There are more functions built-in for working with data frames than we have used so far. If we were to take one of the data frames from a prior example in this chapter, the Titanic dataset from an Excel file, we could use additional functions to help portray and work with the dataset.

As a repeat, we load the dataset using the script:

import pandas as pddf = pd.read_excel('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls')

We can then inspect the data frame using the info function, which displays the characteristics of the data frame:

df.info()

Some of the interesting points are as follows:

1309 entries
14 columns
Not many fields with valid data in the body column—most were lost
Does give a good overview of the types of data involved

We can also use the describe function, which gives us a statistical breakdown of the number columns in the data frame.

df.describe()

This produces the following tabular display:

For each numerical column we have...

Summary

In this chapter, we looked at some of the more compute intensive tasks that might be performed in Jupyter. We used Python to scrape a website to gather data for analysis. We used Python NumPy, pandas, and SciPy functions for in-depth computation of results. We went further into pandas and explored manipulating data frames. Lastly, we saw examples of sorting and filtering data frames.

In the next chapter, we will make some predictions and use visualization to validate our predictions.

The rest of the chapter is locked

You have been reading a chapter from

Jupyter for Data Science

Published in: Oct 2017Publisher: PacktISBN-13: 9781785880070

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Other recommended products

Related to this chapter

Jupyter Cookbook

Jupyter has garnered a strong interest in the data science community of late, as it makes common data processing and analysis tasks much simpler. This book is for data science professionals who want to master various tasks related to Jupyter to create efficient, easy-to-share applications related to data analysis and visualization.

BookApr 2018238 pages

Learning Jupyter 5

In this book, you will learn how to build interactive dashboards in a Jupyter notebook. Explore JupyterHub and various Jupyter widgets through which you can easily perform 3D data visualization, 3D plotting, and geospatial analytics. This book helps you understand BeakerX to create interactive tables and interact with spreadsheets.

BookAug 2018282 pages

JupyterLab Quick Start Guide

Jupyterlab is a web-based data science interface and natural evolution of Jupyter Notebooks. This guide will take you through the core commands and functionalities of JupyterLab. You will learn to customize and enhance your JupyterLab productivity by installing additional extensions.

BookDec 2019160 pages

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

BookMay 2019266 pages

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

Practical Data Science Cookbook

As an increasing amount of data is generated each year, and the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data have a competitive advantage over companies that don't. This drives a higher demand for knowledgeable and competent data professionals. By sequentially working through the steps presented in each chapter, you will quickly familiarize yourself with the data science process, and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis - R and Python.

BookJun 2017434 pages

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

BookSep 2017560 pages

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages