The Jupyter product was derived from the IPython project. The IPython project was used to provide interactive online access to Python. Over time it became useful to interact with other programming languages, such as R, in the same manner. With this split from only Python, the tool grew into its current manifestation of Jupyter. IPython is still an active tool available for use.
Jupyter is available as a web application for a wide variety of platforms. It can also be used on your desktop/laptop over a wide variety of installations. In this book, we will be exploring using Jupyter from a Windows PC and over the internet for other providers.
Jupyter is organized around a few basic concepts:
We can jump right in and see what Jupyter has to offer. A Jupyter screen looks like this:
So, Jupyter is deployed as a website that can be accessed on your machine (or can be accessed like any other website across the internet).
We see the URL of the page, http://localhost:8888/tree
. localhost
is a pseudonym for a web server running on your machine. The website we are accessing on the web server is in a tree display. This is the default display. This conforms to the display of the projects within Jupyter. Jupyter displays objects in a tree layout much like Windows File Explorer. The main page lists a number of projects; each project is its own subdirectory and contains a further delineation of content for each. Depending on where you start Jupyter, the existing contents of the current directory will be included in the display as well.
On the web page, we have the soon to be familiar Jupyter logo and three tabs:
Files
Running
Clusters
The Files
tab lists the objects available to Jupyter. The files used by Jupyter are stored as regular files on your disk. Jupyter provides context managers that know how to process the different types of files and programs you are using. You can see the Jupyter files when you use Windows Explorer to view your file contents (they have an .ipynb
file extension). You can see non-Jupyter files listed in the Jupyter window as well.
The Running
tab lists the notebooks that have been started. Jupyter keeps track of which notebooks are running. This tab allows you to control which notebooks are running at any time.
The Clusters
tab is for environments where several machines are in use for running Jupyter.
Next, we see:
Select items to perform action
Upload
buttonNew
pull down menu andThe prompt tells you that you can select multiple items and then perform the same action on all of them. Most of the following actions (in the menus) can be performed over a single item or a selected set of items.
The Upload
button will present a prompt to select a file to upload to Jupyter. This would typically be used to move a data file into the project for access in the case where Jupyter is running as a website in a remote location where you can't just copy the file to the disk where Jupyter is running.
The New
pull down menu presents a list of choices of the different kinds of Jupyter projects (kernels) that are available:
We can see the list of objects that Jupyter knows how to create:
Text File
: Create a text file for use in this folder. For example, if the notebook were to import a file you may create the file using this feature.Folder
: Yes, just like in Windows File Explorer.Terminals Unavailable
: Grayed out, this feature can be used in a Nix environment.Notebooks
: Grayed out,-this is not really a file type, but a heading to the different types of notebooks that this installation knows how to create.Julia 0.4.5
: Creates a Julia notebook where the coding is in the Julia language.Python 3
: Creates a notebook where the coding is in the Python language. This is the default.R
: Creates a notebook where the coding is in the R language.If we started one of the notebooks (it would automatically be selected in the Jupyter object list) and now looked at the pulldown of actions against the objects selected we would see a display like the following:
We see that the menu action has changed to Rename
, as that is the most likely action to be taken on one file and we have an icon to delete the project as well (the trashcan icon).
The item count is now 1
(we have one object selected in the list), the icon for the one item is a filled in blue square (denoting that it is a running project), and a familiar Home icon to bring us back to the Jupyter home page display in the previous screenshot.
The object's menu has choices for:
Folders
: select the folders availableAll Notebooks
: select the Jupyter NotebooksRunning
: select the running Jupyter NotebooksFiles
: select the files in the directoryIf we scroll down in the object display, we see a little different information in the list of objects available. Each of the objects listed has a type (denoted by the icon shape associated) and a name assigned by the user when it was created.
Each of the objects is a Jupyter project that can be accessed, shared, and moved on its own. Every project has a full name, as entered by the user creating the project, and an icon that portrays this entry as a project. We will see other Jupyter icons corresponding to other project components, as follows:
If we pull down the New
menu and select Python 3
, Jupyter would create a new Python notebook and move to display its contents. We would see a display like the following:
We have created a new Jupyter Notebook and are in its display. The logo is there. The title defaults to Untitled
, which we can change by clicking on it. There is an (autosaved)
marker that tells you Jupyter has automatically stored your notebook to disk (and will continue to do so regularly as you work on it).
We now have a menu bar and a denotation that this notebook is using Python 3 as its source language. The menu choices are:
File
: Standard file operationsEdit
: For editing cell contents (more to come)View
: To change the display of the notebookInsert
: To insert a cell in the notebookCell
: To change the format, usage of a cellKernel
: To adjust the kernel used for the notebookHelp:
To bring up the help system for JupyterThe File
menu has the following choices:
New Notebook
: Similar to the pull down from the home page.Open...
: Open a notebook.Make a Copy...
: Copy a notebook.Rename...
: Rename a notebook.
Save and Checkpoint
: Save the current notebook at a checkpoint. Checkpoints are specific points in a notebook's history that you want to maintain in order to return to a checkpoint if you change your mind about a recent set of changes.Print Preview
: Similar to any print preview that you have used otherwise.Download as
: Allows you to store the notebook in a variety of formats. The most notable formats would be PDF or Excel, which would allow you to share the notebook with users that do not have access to Jupyter.Trusted Notebook
: (The feature is grayed out). When a notebook is opened by a user, the server computes a signature with the user's key, and compares it with the signature stored in the notebook's metadata. If the signature matches, HTML and JavaScript output in the notebook will be trusted at load, otherwise it will be untrusted.Close and Halt
: Close the current notebook and stop it running in the Jupyter system.The Edit
menu has the following choices:
Cut Cells
: Typical cut operation.Copy Cells
: Assuming you are used to the GUI operations of copying cells to memory buffer and later pasting into another location in the notebook.Paste Cells Above
: If you have selected a cell and if you have copied a cell, this option will not be grayed out and will paste the buffered cell above the current cell.Paste Cells Below
: Similar to the previous option.Delete Cells
: Will delete the selected cells.Undo Delete Cells
.Split Cell
: There is a style issue here, regarding how many statements you put into a cell. Many times, you will start with one cell containing a number of statements and split that cell up many times to break off individual or groups of statements into their own cell.Merge Cell Above
: Combine the current cell with the one above it.Merge Cell Below
: Similar to the previous option.Move Cell Up
: Move the current cell before the one above it.Move Cell Down
.Edit Notebook Metadata
: For advanced users to modify the internal programming language used by Jupyter for your notebook.Find and Replace
: Locate specific text within cells and possibly replace.The View
menu has the following choices:
Toggle Header
: Toggle the display of the Jupyter headerToggle Toolbar
: Toggle the display of the Jupyter toolbar
Cell Toolbar
: Change the displayed items for the cell being edited:
None
: Don't display a cell toolbarEdit Metadata
: Edit a cells metadata directlyRaw Cell Format
: Edit the cell raw format as used by JupyterSlideshow
: Walk through the cells in a slideshow mannerThe Insert
menu has the following choices:
Insert Cell Above
: Insert the copied buffer cell in front of the current cellInsert Cell Below
: Same as previous oneThe Cell
menu has the following choices:
Run Cells
: Runs all of the cells in the notebookRun Cells and Select Below
: Runs cells and selects all of the cells below the currentRun Cells and Insert Below
: Runs cells and adds a blank cellRun All
: Runs all of the cellsRun All Above
: Runs all of the cells above the currentRun All Below
: Runs all of the cells below the currentCell Type
: Changes the type of the selected cell(s) to:
Code
: this is the default—the cell would expect to have language statementsMarkdown
: The cell contains HTML markdown,-typically used to display the notebook in the best manner (as it is a website, so has all of HTML available to it)Raw NBConvert
: This is an internal Jupyter format, basically plain textCurrent Outputs
: Whether to clear or continue the outputs from the cellsAll Output
The Kernel
menu is used to control the underlying language engine used by the notebook. The menu choices are as follows. I think many of the choices in this menu are used very little:
Interrupt
: Momentarily stops the underlying language engine and then lets it continueRestart
: Restarts the underlying language engineRestart & Clear Output
Restart & Run All
Reconnect
: If you were to interrupt the kernel, you would then need to reconnect to start running againChange kernel
: Changes the language used in this notebook to one available in your installationThe help menu displays the help options for Jupyter and language context choices. For example, in our Python notebook we see choices for common Python libraries that may be used:
Just below the regular menu is an icon toolbar with many of the commonly used menu items for faster use, as seen in this view:
The icons correspond to the previous menu choices (listed in order of appearance):
If we were to provide a name for the notebook, enter a simple Python script, and execute the notebook cells, we would see a display like this:
The script is:
name = "Dan Toomey"state = "MA"print(name + " lives in " + state)
We assign a value to the name and state variables and then print them out.
If you notice, I have placed the statements into two different cells. This is just for readability. They could all be in the same cell or three different cells.
There are line numbers assigned to each cell. The numbering always starts at 1 for the first cell, then as you move cells around the numbering may grow (as you can see the first cell is labeled cell 2 in the display).
Below the second cell, we have non-editable display results. Jupyter always displays any corresponding output of a cell just below. This could include error information as well.
This book is about Jupyter and data science. We have the introduction to Jupyter. Now, we can look at data science practices and then see how the two concepts work together.
Data science is used in many industries. It is interesting to note the predominant technologies involved and algorithms used by industry. We can see the same technologies available within Jupyter.
Some of the industries that are larger users of data science include:
Industry | Larger data science use | Technology/algorithms |
Finance | Hedge funds | Python |
Gambling | Establish odds | R |
Insurance | Measure and price risk | Domino (R) |
Retail banking | Risk, customer analytics, product analytics | R |
Mining | Smart exploration, yield optimization | Python |
Consumer products | Pricing and distribution | R |
Healthcare | Drug discovery and trials | Python |
In this section we see several examples taken from current industry focus and apply them in Jupyter to ensure its utility.
There is an example of this at https://www.safaribooksonline.com/library/view/python-for-finance/9781491945360/ch03.htmlwhich is taken from the bookPython for Financeby Yves Hilpisch. The model used is fairly standard for finance work.
We want to arrive at the theoretical value of a call option. A call option is the right to buy a security, such as IBM stock, at a specific (strike) price within a certain time frame. The option is priced based on the riskiness or volatility of the security in relation to the strike price and current price. The example uses a European option which can only be exercised at maturity-this simplifies the problem set.
The example is using Black-Scholes model for option valuation where we have:
These elements make up the following formula:
The algorithm used is as follows:
The script is as follows. We use numpy
for the intense mathematics used. The rest of the coding is typical:
from numpy import * # set parameters S0 = 100. K = 105. T = 1.0 r = 0.05 sigma = 0.2 # how many samples we are using I = 100000 random.seed(103) z = random.standard_normal(I) ST = S0 * exp((r - 0.5 * sigma ** 2) * T + sigma * sqrt(T) * z) hT = maximum(ST - K, 0) C0 = exp(-r * T) * sum(hT) / I # tell user results print ("Value of the European Call Option %5.3f" % C0)
The results under Jupyter are as shown in the following screenshot:
The 8.071
value corresponds with the published expected value 8.019 due to variance in the random numbers used. (I am seeding the random number generator to have reproducible results).
Another algorithm in popular use is Monte Carlo simulation. In Monte Carlo, as the name of the gambling resort implies, we simulate a number of chances taken in a scenario where we know the percentage outcomes of the different results, but do not know exactly what will happen in the next N chances. We can see this model being used at http://www.codeandfinance.com/pricing-options-monte-carlo.html. In this example, we are using Black-Scholes again, but in a different direct method where we see individual steps.
The coding is as follows. The Python coding style for Jupyter is slightly different than used directly in Python, as you can see by the changed imports near the top of the code. Rather than just pulling in the functions you want from a library, you pull in the entire library and the coding uses what is needed:
import datetime import random # import gauss import math #import exp, sqrt random.seed(103) def generate_asset_price(S,v,r,T): return S * exp((r - 0.5 * v**2) * T + v * sqrt(T) * gauss(0,1.0)) def call_payoff(S_T,K): return max(0.0,S_T-K) S = 857.29 # underlying price v = 0.2076 # vol of 20.76% r = 0.0014 # rate of 0.14% T = (datetime.date(2013,9,21) - datetime.date(2013,9,3)).days / 365.0 K = 860. simulations = 90000 payoffs = [] discount_factor = math.exp(-r * T) for i in xrange(simulations): S_T = generate_asset_price(S,v,r,T) payoffs.append( call_payoff(S_T, K) ) price = discount_factor * (sum(payoffs) / float(simulations)) print ('Price: %.4f' % price)
The results under Jupyter are shown as follows:
The result price of 14.4452
is close to the published value 14.5069.
Some of the gambling games are really coin flips, with 50/50 chances of success. Along those lines we have coding from http://forumserver.twoplustwo.com/25/probability/flipping-coins-getting-3-row-1233506/ that determines the probability of a series of heads or tails in a coin flip, with a trigger that can be used if you know the coin/game is biased towards one result or the other.
We have the following script:
############################################### Biased/unbiased recursion of heads OR tails##############################################import numpy as npimport mathN = 14 # number of flipsm = 3 # length of run (must be > 1 and <= N/2)p = 0.5 # P(heads)prob = np.repeat(0.0,N)h = np.repeat(0.0,N)t = np.repeat(0.0,N)h[m] = math.pow(p,m)t[m] = math.pow(1-p,m)prob[m] = h[m] + t[m]for n in range(m+1,2*m): h[n] = (1-p)*math.pow(p,m) t[n] = p*math.pow(1-p,m) prob[n] = prob[n-1] + h[n] + t[n]for n in range(2*m,N): h[n] = ((1-p) - t[n-m] - prob[n-m-1]*(1-p))*math.pow(p,m) t[n] = (p - h[n-m] - prob[n-m-1]*p)*math.pow(1-p,m) prob[n] = prob[n-1] + h[n] + t[n]prob[N-1]
The preceding code produces the following output in Jupyter:
We end up with the probability of getting three heads in a row with an unbiased game. In this case, there is a 92% chance (within the range of tests we have run 14 flips).
We have an example of using R to come up with the pricing for non-life products, specifically mopeds, at http://www.cybaea.net/journal/2012/03/13/R-code-for-Chapter-2-of-Non_Life-Insurance-Pricing-with-GLM/.The code first creates a table of the statistics available for the product line, then compares the pricing to actual statistics in use.
The first part of the code that accumulates the data is as follows:
con <- url("http://www2.math.su.se/~esbj/GLMbook/moppe.sas") data <- readLines(con, n = 200L, warn = FALSE, encoding = "unknown") close(con) ## Find the data range data.start <- grep("^cards;", data) + 1L data.end <- grep("^;", data[data.start:999L]) + data.start - 2L table.1.2 <- read.table(text = data[data.start:data.end], header = FALSE, sep = "", quote = "", col.names = c("premiekl", "moptva", "zon", "dur", "medskad", "antskad", "riskpre", "helpre", "cell"), na.strings = NULL, colClasses = c(rep("factor", 3), "numeric", rep("integer", 4), "NULL"), comment.char = "") rm(con, data, data.start, data.end) # Remainder of Script adds comments/descriptions comment(table.1.2) <- c("Title: Partial casco moped insurance from Wasa insurance, 1994--1999", "Source: http://www2.math.su.se/~esbj/GLMbook/moppe.sas", "Copyright: http://www2.math.su.se/~esbj/GLMbook/") ## See the SAS code for this derived field table.1.2$skadfre = with(table.1.2, antskad / dur) ## English language column names as comments: comment(table.1.2$premiekl) <- c("Name: Class", "Code: 1=Weight over 60kg and more than 2 gears", "Code: 2=Other") comment(table.1.2$moptva) <- c("Name: Age", "Code: 1=At most 1 year", "Code: 2=2 years or more") comment(table.1.2$zon) <- c("Name: Zone", "Code: 1=Central and semi-central parts of Sweden's three largest cities", "Code: 2=suburbs and middle-sized towns", "Code: 3=Lesser towns, except those in 5 or 7", "Code: 4=Small towns and countryside, except 5--7", "Code: 5=Northern towns", "Code: 6=Northern countryside", "Code: 7=Gotland (Sweden's largest island)") comment(table.1.2$dur) <- c("Name: Duration", "Unit: year") comment(table.1.2$medskad) <- c("Name: Claim severity", "Unit: SEK") comment(table.1.2$antskad) <- "Name: No. claims" comment(table.1.2$riskpre) <- c("Name: Pure premium", "Unit: SEK") comment(table.1.2$helpre) <- c("Name: Actual premium", "Note: The premium for one year according to the tariff in force 1999", "Unit: SEK") comment(table.1.2$skadfre) <- c("Name: Claim frequency", "Unit: /year") ## Save results for later save(table.1.2, file = "table.1.2.RData") ## Print the table (not as pretty as the book) print(table.1.2)
The resultant first 10 rows of the table are as follows:
premiekl moptva zon dur medskad antskad riskpre helpre skadfre1 1 1 1 62.9 18256 17 4936 2049 0.270270272 1 1 2 112.9 13632 7 845 1230 0.062001773 1 1 3 133.1 20877 9 1411 762 0.067618334 1 1 4 376.6 13045 7 242 396 0.018587365 1 1 5 9.4 0 0 0 990 0.000000006 1 1 6 70.8 15000 1 212 594 0.014124297 1 1 7 4.4 8018 1 1829 396 0.227272738 1 2 1 352.1 8232 52 1216 1229 0.147685329 1 2 2 840.1 7418 69 609 738 0.0821330810 1 2 3 1378.3 7318 75 398 457 0.05441486
Then, we go through each product/statistics to determine whether the pricing for a product is in line with others. Note, therepos =
clause on theinstall.packages
statement is a fairly new addition to R:
# make sure the packages we want to use are installed install.packages(c("data.table", "foreach", "ggplot2"), dependencies = TRUE, repos = "http://cran.us.r-project.org") # load the data table we need if (!exists("table.1.2")) load("table.1.2.RData") library("foreach") ## We are looking to reproduce table 2.7 which we start building here, ## add columns for our results. table27 <- data.frame(rating.factor = c(rep("Vehicle class", nlevels(table.1.2$premiekl)), rep("Vehicle age", nlevels(table.1.2$moptva)), rep("Zone", nlevels(table.1.2$zon))), class = c(levels(table.1.2$premiekl), levels(table.1.2$moptva), levels(table.1.2$zon)), stringsAsFactors = FALSE) ## Calculate duration per rating factor level and also set the ## contrasts (using the same idiom as in the code for the previous ## chapter). We use foreach here to execute the loop both for its ## side-effect (setting the contrasts) and to accumulate the sums. # new.cols are set to claims, sums, levels new.cols <- foreach (rating.factor = c("premiekl", "moptva", "zon"), .combine = rbind) %do% { nclaims <- tapply(table.1.2$antskad, table.1.2[[rating.factor]], sum) sums <- tapply(table.1.2$dur, table.1.2[[rating.factor]], sum) n.levels <- nlevels(table.1.2[[rating.factor]]) contrasts(table.1.2[[rating.factor]]) <- contr.treatment(n.levels)[rank(-sums, ties.method = "first"), ] data.frame(duration = sums, n.claims = nclaims) } table27 <- cbind(table27, new.cols) rm(new.cols) #build frequency distribution model.frequency <- glm(antskad ~ premiekl + moptva + zon + offset(log(dur)), data = table.1.2, family = poisson) rels <- coef( model.frequency ) rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] ) table27$rels.frequency <- c(c(1, rels[1])[rank(-table27$duration[1:2], ties.method = "first")], c(1, rels[2])[rank(-table27$duration[3:4], ties.method = "first")], c(1, rels[3:8])[rank(-table27$duration[5:11], ties.method = "first")]) # note the severities involved model.severity <- glm(medskad ~ premiekl + moptva + zon, data = table.1.2[table.1.2$medskad > 0, ], family = Gamma("log"), weights = antskad) rels <- coef( model.severity ) rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] ) ## Aside: For the canonical link function use ## rels <- rels[1] / (rels[1] + rels[-1]) table27$rels.severity <- c(c(1, rels[1])[rank(-table27$duration[1:2], ties.method = "first")], c(1, rels[2])[rank(-table27$duration[3:4], ties.method = "first")], c(1, rels[3:8])[rank(-table27$duration[5:11], ties.method = "first")]) table27$rels.pure.premium <- with(table27, rels.frequency * rels.severity) print(table27, digits = 2)
The resultant display is as follows:
rating.factor class duration n.claims rels.frequency rels.severity1 Vehicle class 1 9833 391 1.00 1.002 Vehicle class 2 8825 395 0.78 0.5511 Vehicle age 1 1918 141 1.55 1.7921 Vehicle age 2 16740 645 1.00 1.0012 Zone 1 1451 206 7.10 1.2122 Zone 2 2486 209 4.17 1.073 Zone 3 2889 132 2.23 1.074 Zone 4 10069 207 1.00 1.005 Zone 5 246 6 1.20 1.216 Zone 6 1369 23 0.79 0.987 Zone 7 148 3 1.00 1.20 rels.pure.premium1 1.002 0.4211 2.7821 1.0012 8.6222 4.483 2.384 1.005 1.466 0.787 1.20
Here, we can see that some vehicle classes (2
,6
) are priced very low in comparison to statistics for that vehicle where as other are overpriced (12
, 22
).
We take the example from a presentation I made atwww.dantoomeysoftware.com/Using_R_for_Marketing_Research.pptxlooking at the effectiveness of different ad campaigns for grape fruit juice.
The code is as follows:
#library(s20x)library(car)#read the dataset from an existing .csv filedf <- read.csv("C:/Users/Dan/grapeJuice.csv",header=T)#list the name of each variable (data column) and the first six rows of the datasethead(df)# basic statistics of the variablessummary(df)#set the 1 by 2 layout plot windowpar(mfrow = c(1,2))# boxplot to check if there are outliersboxplot(df$sales,horizontal = TRUE, xlab="sales")# histogram to explore the data distribution shapehist(df$sales,main="",xlab="sales",prob=T)lines(density(df$sales),lty="dashed",lwd=2.5,col="red")#divide the dataset into two sub dataset by ad_typesales_ad_nature = subset(df,ad_type==0)sales_ad_family = subset(df,ad_type==1)#calculate the mean of sales with different ad_typemean(sales_ad_nature$sales)mean(sales_ad_family$sales)#set the 1 by 2 layout plot windowpar(mfrow = c(1,2))# histogram to explore the data distribution shapeshist(sales_ad_nature$sales,main="",xlab="sales with nature production theme ad",prob=T)lines(density(sales_ad_nature$sales),lty="dashed",lwd=2.5,col="red")hist(sales_ad_family$sales,main="",xlab="sales with family health caring theme ad",prob=T)lines(density(sales_ad_family$sales),lty="dashed",lwd=2.5,col="red")
With output (several sections):
(raw data from file, first 10 rows):
sales | price | ad_type | price_apple | price_cookies | |
1 | 222 | 9.83 | 0 | 7.36 | 8.8 |
2 | 201 | 9.72 | 1 | 7.43 | 9.62 |
3 | 247 | 10.15 | 1 | 7.66 | 8.9 |
4 | 169 | 10.04 | 0 | 7.57 | 10.26 |
5 | 317 | 8.38 | 1 | 7.33 | 9.54 |
6 | 227 | 9.74 | 0 | 7.51 | 9.49 |
Statistics on the data are as follows:
sales price ad_type price_apple Min. :131.0 Min. : 8.200 Min. :0.0 Min. :7.300 1st Qu.:182.5 1st Qu.: 9.585 1st Qu.:0.0 1st Qu.:7.438 Median :204.5 Median : 9.855 Median :0.5 Median :7.580 Mean :216.7 Mean : 9.738 Mean :0.5 Mean :7.659 3rd Qu.:244.2 3rd Qu.:10.268 3rd Qu.:1.0 3rd Qu.:7.805 Max. :335.0 Max. :10.490 Max. :1.0 Max. :8.290 price_cookies Min. : 8.790 1st Qu.: 9.190 Median : 9.515 Mean : 9.622 3rd Qu.:10.140 Max. :10.580
The data shows the effectiveness of each campaign. Family sales are more effective:
The difference is more pronounced on the histogram displays:
Docker is a mechanism that allows you to have many complete virtual instances of an application in one machine. Docker is used by many software firms to provide a fully scalable implementation of their services, and support as many concurrent users as needed.
Prior mechanisms for dealing with multiple instances shared common resources (such as disk address space). Under Docker, each instance is a complete entity separate from all others.
Implementing Jupyter on a Docker environment allows multiple users to access their own Jupyter instance, without having to worry about interfering with someone else's calculations.
The key feature of Docker is allowing for a variable number of instances of your notebook to be in use at any time. The Docker control system can be set up to create new instances for every user that accesses your notebook. All of this is built-in to Docker without programming; just use the user interface to decide how to create instances.
There are two ways you can use Docker:
There are several services out there. I think they work pretty much the same way: sign up for the service, upload your notebook, monitor usage (the Docker control program tracks usage automatically). For example, if we use https://hub.docker.com/ we are really using a version repository for our notebook. Versioning is used in software development for tracking changes that are made over time. This also allows for multiple user access privileges as well:
Installing Docker is operating system dependent. Go to the https://www.docker.com/ home page for instructions for your machine.
Docker on your local machine would only be a precursor to posting on a public Docker service, unless the machine you are installing Docker on is accessible by others.
There are several ways to share Jupyter Notebooks with others:
In order to email your notebook, the notebook must be converted to a plain text format, sent as an attachment to the recipient, and then the recipient must convert it back to the 'binary' notebook format.
Email attachments are normally converted to a well-defined MIME (Multi-purpose Internet Mail Extension) format. There is a program available that converts the notebook format, nb2mail
, which converts the notebook to a notebook MIME format. The program is available at https://github.com/nfultz/nb2mail.
Usage is as follows:
nb2mail
using pip
command (see website)nb2mail
)Google Drive can be used to store your notebook profile information. This might be used when combined with the previous emailing of a notebook to another user. The recipient could use a Google Drive profile that would preclude anyone without the profile information from interacting with the notebook.
You install the python extension (from https://github.com/jupyter/jupyter-drive) using pip
and then python -m
. From then on, you access the notebooks with the Google Drive profiles, as ipython notebook -profile <profilename>
.
GitHub (and others) allow you to place a notebook on their servers that, once there, can be accessed directly using the nbviewer. The server has installed Python (and other language) coding needed to support your notebook. The nbviewer is a read-only use of your notebook, and is not interactive.
The nbviewer is available at https://github.com/jupyter/nbviewer. The site includes specific parameters which need to be added to the ipython notebook
command, such as the command to start the viewer.
A built-in feature of notebooks is to export the notebook into different formats. One of those is HTML. In this manner, you could export the notebook into HTML and copy the file(s) onto your web server as changes are made.
The command is jupyter nbconvert <notebook name>.ipynb --to html
.
Again, this would be a non-interactive, read-only version of your notebook.
Jupyter is deployed as a web application. If you have direct access to a web server, you could install Jupyter on the web server, create notebooks on that web server, and then the notebooks would be available to others that are completely dynamic.
As a web server you also have control over access to the web server so can control who can access your notebook.
This is an advanced interaction that would require working with your webmaster to determine the correct approach.
There are two aspects to security in Jupyter Notebooks:
While many of the uses of Jupyter are solely for educating others, there are instances where the information being accessed is and should remain confidential. Jupyter allows you to put up barriers to entry to your notebook in several manners.
When we identify the user, we are authenticating that user. This is normally done by presenting a login challenge before allowing entry, where the user has to enter a username and password.
If the instance of Jupyter hosting, your notebook is installed on a web server and you can use the web server's access control to limit access to your notebook. Further, most of the vendors that support notebook hosting provide a mechanism to limit access to specific users.
The other aspect of security is to make sure the contents of your notebooks are not malicious. You should make sure your notebook is safe, as follows:
In this chapter, we looked into the details of the Jupyter user interface: what objects does it work with, what actions can be taken by Jupyter, what does the display tell us about the data, and what tools are available? Next, we looked at some real-life examples from industry showing R and Python coding from several industries. Then we saw some of the ways to share our notebook with other users and, correspondingly, how to protect our notebook with different security mechanisms.
In the next chapter, we will see how far we can go using Python in a Jupyter Notebook.
Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.
If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.
Please Note: Packt eBooks are non-returnable and non-refundable.
Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:
If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:
Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.
You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.
Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.
When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.
For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.