Packt+ | Advance your knowledge in tech

You're reading from Learning Jupyter

Product typeBook

Published inNov 2016

Reading LevelIntermediate

PublisherPackt

ISBN-139781785884870

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1)

Dan Toomey

Chapter 2. Jupyter Python Scripting

Jupyter was originally IPython-an interactive version of Python to be used as a development environment. As such, most of the features of Python are available to you when developing your notebook.

In this chapter, we will cover the following topics:

Basic Python scripting
Python dataset access (from a library)
Python pandas
Python graphics
Python random numbers

Basic Python in Jupyter

In this chapter, we will be using Python scripts in a Jupyter Notebook. Jupyter does not interact with your scripts as much as it executes your script and records results. I think this is how Jupyter Notebooks have been extended to use other languages besides Python-the notebook just takes a script, runs it against a language engine, and records the output from the engine-all the while not really knowing what kind of script is being executed.

Similarly, I have not noticed any particular limitations when using Python in Jupyter. Some of the scripts I have run have taken a lot of time to run, used a lot of memory, opened new windows, and so on, all without failing. There are known issues running Python scripts that contain a __main__ execution loop and multithreaded applications.

We must open a Python section to our notebook to use Python coding. So, start your notebook, then, in the upper-right menu, select Python 2.

Note

I installed Jupyter in the Spring of 2016 on a...

Python data access in Jupyter

Now that we have seen how Python works in Jupyter, including the underlying encoding, then how does Python accessing a large dataset work in Jupyter?

I started another view for pandas using Python Data Access as the name. From here, we will read in a large dataset and compute some standard statistics on the data. We are interested in seeing how we use pandas in Jupyter, how well the script performs, and what information is stored in the metadata (especially if it is a larger dataset).

Our script accesses the iris dataset that's built into one of the Python packages. All we are looking to do is to read in a slightly large number of items and calculate some basic operations on the dataset. We are really interested to see how much of the data is cached in the IPYNB file

The Python code is as follows:

# import the datasets package
from sklearn import datasets
# pull in the iris data
iris_dataset = datasets.load_iris()
# grab the first two columns of data
X = iris_dataset...

Python pandas in Jupyter

One of the most widely used features of Python is pandas. It is a third-party library of data analysis packages that can be used freely. In this example, we will develop a Python script that uses pandas to see if there is any effect to using it in Jupyter.

I am using the Titanic dataset from http://www.kaggle.com/c/titanic-gettingStarted/download/train.csv. I am sure the same data is available from a variety of sources.

Here is the Python script that we want to run in Jupyter:

from pandas import *
training_set = read_csv('train.csv')
training_set.head()
male = training_set[training_set.sex == 'male']
female = training_set[training_set.sex =='female']
womens_survival_rate = float(sum(female.survived))/len(female)
mens_survival_rate = float(sum(male.survived))/len(male)

The result is we calculate the survival rates of the Titanic's passengers based on their sex.

We create a new notebook, enter the script into appropriate cells, include adding displays of calculated data...

Python graphics in Jupyter

How does Python graphics work in Jupyter?

I started another view for this named Python Graphics so as to distinguish the work from the previous work.

If we were to build a sample dataset of baby names and the number of births in a year of that name, we could then plot the data.

The Python coding is simple:

import pandas
import matplotlib
%matplotlib inline
baby_name = ['Alice','Charles','Diane','Edward']
number_births = [96, 155, 66, 272]
dataset = list(zip(baby_name,number_births))
df = pandas.DataFrame(data = dataset, columns=['Name', 'Number'])
df['Number'].plot()

The steps of the script are as follows:

Import the graphics library (and data library) that we need.
Define our data.
Convert the data into a format that allows easy graphical display.
Plot the data.

We would expect a graph of the number of births by baby name.

If we take the preceding script and place it into cells of our Jupyter Notebook, we get something that looks like the following screenshot:

I have...

Python random numbers in Jupyter

For many analyses, we are interested in calculating repeatable results. However, a lot of analysis relies on random numbers being used. In Python, you can set the seed for the random number generator to achieve repeatable results with the random_seed() function.

In this example, we simulate rolling a pair of dice and looking at the outcome.

The script we are using is this:

import pylab
import random
random.seed(113)
samples = 1000
dice = []
for i in range(samples):
    total = random.randint(1,6) + random.randint(1,6)
    dice.append(total)
pylab.hist(dice, bins= pylab.arange(1.5,12.6,1.0))
pylab.show()

Once we have the script in Jupyter and execute it, we have this result:

I had added some more statistics. I'm not sure I would have counted on such a high standard deviation. If we increased the number of samples, this would increase.

The resulting graph was opened in a new window, much as it would be if you ran this script in another Python development environment...

Summary

In this chapter, we walked through a simple notebook and the underlying structure. Then, we saw an example of using pandas. We looked at a graphics example. Finally, we looked at an example using random numbers in a Python script.

In the next chapter, we will learn all about R scripting in a Jupyter Notebook.

The rest of the chapter is locked

You have been reading a chapter from

Learning Jupyter

Published in: Nov 2016Publisher: PacktISBN-13: 9781785884870

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages