Learning Jupyter

3.5 (4 reviews total)
By Dan Toomey
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction to Jupyter

About this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and much more.

This book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next we’ll help you will learn to integrate Jupyter system with different programming languages such as R, Python, JavaScript, and Julia and explore the various versions and packages that are compatible with the Notebook system. Moving ahead, you master interactive widgets, namespaces, and working with Jupyter in a multiuser mode.

Towards the end, you will use Jupyter with a big data set and will apply all the functionalities learned throughout the book.

Publication date:
November 2016
Publisher
Packt
Pages
238
ISBN
9781785884870

 

Chapter 1. Introduction to Jupyter

Jupyter is a tool that allows data scientists to record their complete analysis process, much in the same way other scientists use a lab notebook to record tests, progress, results, and conclusions.

The Jupyter product was originally developed as part of the IPython project. The IPython project was used to provide interactive online access to Python. Over time it became useful to interact with other data analysis tools, such as R, in the same manner. With this split from Python, the tool grew into its current manifestation of Jupyter. IPython is still an active tool that's available for use. The name Jupyter itself is derived from the combination of Julia, Python, and R.

Jupyter is available as a web application from a number of places. It can also be used locally over a wide variety of installations. In this book, we will be exploring using Jupyter on a Mac and a Windows PC and over the Internet with other providers.

In this chapter, we will cover the following topics:

  • First look at Jupyter

  • Installing Jupyter on Windows

  • Installing Jupyter on Mac

  • Notebook structure

  • Notebook workflow

  • Basic notebook operations

  • Security in Jupyter

  • Configuration options for Jupyter

 

First look at Jupyter


Here is a sample opening page when using Jupyter (this screenshot is on a Windows machine):

You should get yourself acquainted with the environment. The Jupyter user interface has a number of components:

  • Product title, Jupyter, in the top left (as expected). The logo and the title name are clickable and will return you to the Jupyter Notebook home page.

  • There are three tabs displayed: Files, Running, and Clusters:

    • The Files tab shows the list of files in the current directory of the page (described later on in this section).

    • The Running tab presents another screen of the currently running processes and notebooks. The drop-down lists for Terminals and Notebooks are populated with their running members:

    • The Clusters tab presents another screen to display the list of clusters available. This topic is covered in a later chapter:

  • In the top right corner of the screen are three buttons: Upload, New (menu), and a Refresh button.

  • The Upload button is used to add files to the notebook space. You may also just drag and drop as you would when handling files. Similarly, you can drag and drop notebooks into specific folders as well.

  • The menu with New at the top presents a further menu of Text File, Folder, Terminals Unavailable, Notebooks, and Python 2:

    • The Text File option is used to add a text file to the current directory. Jupyter will open a new browser window for you running a text editor. The text entered is automatically saved and will be displayed in your notebook's Files display:

      Note

      The default filename, untitled.txt, is editable.

    • The Folder option creates a new folder with the name Untitled Folder. Remember, all of the file/folder names are editable:

    • The Terminals Unavailable option is disabled for Windows. On a Mac, the option allows you to start an IPython session.

    • The Notebooks option will be activated when additional notebooks are available in your environment.

    • The Python 2 option is used to begin a Python 2 session interactively in your notebook. The interface looks like the following screenshot. You have full file editing capabilities for your script, including saving as a new file. You also have a complete working IDE for your Python script:

      Note

      Like the Text File and Folder option, you have created a Python script file in your notebook and it is running!

  • The refresh button is used to update the display. It's not really necessary as the display is reactive to any changes in the underlying file structure.

  • At the top of the Files tab's item list is a checkbox, a drop-down menu, and a home button:

    • The checkbox is used to toggle all the checkboxes in the Items list

    • The drop-down menu presents a list of the choices available, Folders, All Notebooks, Running, and Files, as shown in the following screenshot:

    • The Folders selection will select all the folders in the display and present a count of the folders in the small box

    • The All Notebooks selection will change the count to the number of notebooks and provide you with three options:

      • Duplicate the current notebook

      • Shut down the current notebook

      • Trash the current notebook

    • You can see them in the following screenshot:

    • The Running selection will select any running scripts and update the count to the number selected

    • The Files selection will select all of the files in the notebook display and update the count accordingly

    • The home button brings you back to the home screen of the notebook.

On the left-hand side of every item is a checkbox, an icon, and the item's name:

  • The checkbox is used to build a set of files to operate upon.

  • The icon is indicative of the type of item. In this case, all of the items are folders.

  • The name of the item corresponds to the name of the object. In this case, the filenames are as used on the disk.

 

Installing Jupyter on Windows


Jupyter requires Python to be installed (it is based on the Python language). There are a couple of tools that will automate the installation of Jupyter (and optionally Python) from a GUI. In this case, we are showing how to install using Anaconda, which is a Python tool for distributing software. You first have to install Anaconda. It is available on Windows and Mac environments. Download the executable from https://www.continuum.io/ (company that produces Anaconda) and run it to install Anaconda. The software provides a regular installation setup process, as shown in the following screenshot:

The installation process goes through the regular steps of making you agree to the distribution rights license:

The standard Windows installation allows you to decide whether all users on the machine can run the new software or not. If you are sharing a machine with different levels of users, then you can decide the appropriate action:

After clicking on Next, it will ask for a destination for the software to reside (I almost always keep the default paths):

And, most importantly, make sure that Python installed with Anaconda provides your Python basis going forward (by being placed in the execution path). Remember, Anaconda uses Python tool itself, so this is important.

Note

This process takes some time to download and install.

Once Anaconda is installed, you need to run a command-line instruction to install Jupyter. The command is as follows:

conda install jupyter

This will invoke a process to download all the necessary components for Jupyter onto your PC. Your output should look something like this:

C:\Users\Dan>conda install jupyter
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ....
Solving package specifications: .........
# packages in environment at C:\Users\Dan\Anaconda2:
#
jupyter                   1.0.0                    py27_2

Note

Additional lines will be present for an install. I have abbreviated the output. You now have Jupyter installed on your machine. You can start the process using the following command:

C:\Users\Dan>jupyter notebook

This command is starting a Jupyter Notebook server on your machine. Once the server is started, a browser instance will be opened at the starting point of the notebook. You should see logging statements similar to the following on your machine as the server starts:

[I 16:21:59.144 NotebookApp] Writing notebook server cookie secret to C:\Users\Dan\AppData\Roaming\jupyter\runtime\notebook_cookie_secret
[I 16:21:59.846 NotebookApp] Serving notebooks from local directory: C:\Users\Dan
[I 16:21:59.846 NotebookApp] 0 active kernels
[I 16:21:59.846 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 16:21:59.862 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Once Jupyter is running, you will notice a running icon for Jupyter (two inverted crescents) at the bottom of your screen:

Note, the last line of the log is the instruction you must use to stop the server (press Ctrl + C in the command-line window where the server is running).

If you press Ctrl + C in that window, the Jupyter server will shut down gracefully:

[W 17:26:36.688 NotebookApp] 404 GET /favicon.ico (::1) 62.00ms referer=None
[W 17:26:36.750 NotebookApp] 404 GET /favicon.ico (::1) 0.00ms referer=None
[I 17:28:24.891 NotebookApp] Interrupted...
[I 17:28:24.891 NotebookApp] Shutting down kernels

You will notice that the Anaconda package has been installed on your application menu for further use:

 

Installing Jupyter on Mac


On Mac, you can use the same Anaconda GUI (for Mac) as described in the previous section. You may also use the command-line tools available for Linux on your Mac.

You must first install Anaconda. Download the latest version and execute the embedded shell script to install.

Installing Jupyter on Mac is done through the command line using the conda install command:

bmac:~ dtoomey$ conda install jupyter 
Fetching package metadata: .... 
Solving package specifications: .................................... 
Package plan for installation in environment /Users/dtoomey/anaconda: 

The following packages will be downloaded:

    package                    |            build 
    ---------------------------|----------------- 
    mistune-0.7.2              |           py27_1         178 KB 
    setuptools-20.3            |           py27_0         453 KB 
    conda-4.0.5                |           py27_0         185 KB 
    pexpect-4.0.1              |           py27_0          63 KB 
    traitlets-4.2.1            |           py27_0         108 KB 
    ipython-4.1.2              |           py27_2         931 KB 
    jupyter_core-4.1.0         |           py27_0          51 KB 
    jupyter_client-4.2.2       |           py27_0          96 KB 
    jupyter_console-4.1.1      |           py27_0          24 KB 
    notebook-4.1.0             |           py27_2         4.4 MB 
    qtconsole-4.2.1            |           py27_0         160 KB 
    jupyter-1.0.0              |           py27_2           2 KB 
    ------------------------------------------------------------ 
                                           Total:         6.6 MB 

The following packages will be updated:

    conda:           3.19.3-py27_0 --> 4.0.5-py27_0 
    ipython:         4.1.2-py27_0  --> 4.1.2-py27_2 
    jupyter:         1.0.0-py27_1  --> 1.0.0-py27_2 
    jupyter_client:  4.1.1-py27_0  --> 4.2.2-py27_0 
    jupyter_console: 4.1.0-py27_0  --> 4.1.1-py27_0 
    jupyter_core:    4.0.6-py27_0  --> 4.1.0-py27_0 
    mistune:         0.7.1-py27_0  --> 0.7.2-py27_1 
    notebook:        4.1.0-py27_0  --> 4.1.0-py27_2 
    pexpect:         3.3-py27_0    --> 4.0.1-py27_0 
    qtconsole:       4.1.1-py27_0  --> 4.2.1-py27_0 
    setuptools:      20.1.1-py27_0 --> 20.3-py27_0 
    traitlets:       4.1.0-py27_0  --> 4.2.1-py27_0 
Proceed ([y]/n)? y 
Fetching packages ... 
mistune-0.7.2- 100% |#################| Time: 0:00:00   1.87 MB/s 
setuptools-20. 100% |#################| Time: 0:00:00   3.53 MB/s 
conda-4.0.5-py 100% |#################| Time: 0:00:00   2.47 MB/s 
pexpect-4.0.1- 100% |#################| Time: 0:00:00   1.26 MB/s 
traitlets-4.2. 100% |#################| Time: 0:00:00   1.71 MB/s 
ipython-4.1.2- 100% |#################| Time: 0:00:00   1.77 MB/s 
jupyter_core-4 100% |#################| Time: 0:00:00   2.34 MB/s 
jupyter_client 100% |#################| Time: 0:00:00   1.58 MB/s 
jupyter_consol 100% |#################| Time: 0:00:00   7.82 MB/s 
notebook-4.1.0 100% |#################| Time: 0:00:00   4.75 MB/s 
qtconsole-4.2. 100% |#################| Time: 0:00:00   1.37 MB/s 
jupyter-1.0.0- 100% |#################| Time: 0:00:00   2.71 MB/s 
Extracting packages ... 
[      COMPLETE ]|#############################################| 100% 
Unlinking packages ... 
[      COMPLETE ]|#############################################| 100% 
Linking packages ...
[      COMPLETE ]|#############################################| 100% 

Note

You have installed Jupyter.

 

Notebook structure


A Jupyter Notebook is fundamentally a JSON file with a number of annotations. The main parts of the Notebook are as follows:

  • Metadata: A data dictionary of definitions used to set up and display the notebook

  • Notebook  format: Version numbers of the software used to create the notebook (the version number is used for backward compatibility)

  • List  of  cells: There are different types of cell for markdown (display), code (to execute), and output (of the code type cells)

 

Notebook workflow


The typical workflow is as follows:

  • Create a new notebook for a project or data analysis.

  • Add your analysis steps, coding, and output.

  • Surround your analysis with organizational and presentation markdown to communicate an entire story.

  • Interactive notebooks (that include widgets and display modules) would then be used by others by modifying parameters and data to note the effects of their changes. Your markdown would present the cases that a user may want to investigate and probable results.

 

Basic notebook operations


In this section, we describe the different operations that you can perform on your Jupyter Notebook. Most of the operations are menu functions that will change your display accordingly.

File operations

Let's walk through the basic file operations.

From the Files tab, we see a list of files and folders in the current notebook/disk folder. If we select (check) one of the files, we see the top-left menu change:

We now have choices of Duplicate, Rename, and delete (the trashcan icon). Note the number of files selected, 1, is displayed in the box as well.

Duplicate

If we hit the Duplicate button, we get a confirmation prompt with the name of the file selected for duplication:

Cancel will close the dialog. Duplicate will create another copy of the file with an appended copy number, as in the following screenshot. The original filename has been used with the addition of -Copyn in the filename, where n is the copy number. Note the original file extension, .properties, has been maintained in the new file:

Rename

Similarly, if we hit the Rename button, another dialog box will appear to prompt the new filename to apply. The main filename has been highlighted as it assumes you want to maintain the file extension as the file type has not changed:

Delete

We can also delete the file by clicking on the trashcan icon. This brings up a confirmation dialog box:

At the top right of the screen we have options for Upload and New (Text File, Folder, or Python 2).

Upload

The Upload button is more meaningful when the notebook is stored on a web server. When running it on your desktop, it allows you to move files easily from one part of your notebook to another. If you click the button, you are presented with a file selector dialog box. The following screenshot is specific to a Windows environment, but a similar display is presented on a Mac. Once you select a file, it will be added to your notebook space:

New text file

If we opt to create a New Text File, we are presented with a new browser panel in the Jupyter text editor (Note that I have shrunk down the size of the screen so the display fits the boundaries of this book):

There are several points of interest on this screen:

  • We are in a new browser panel (the notebook display is still present in the other tab).

  • The name of the new file is untitled1.txt. Using the same convention as duplication, the new filename starts with untitled.txt and is incremented as needed.

  • Curiously, it mentions when the file was created.

  • In the top-right corner, we see Plain Text. So, we might expect to see some other description here for other file types.

  • We have a new menu, File, Edit, View, and Language.

  • The File menu has the following options:

    • New: Start another new text window

    • Save: Save/update the current text file into the notebook area

    • Rename: Change the name of the file (unlikely you would want to keep the untitledn name provided)

    • Download: Again, an option that makes more sense if your notebook is running on the Web. As explained for Upload, Download on a desktop installation allows you to copy a file to another part of your machine.

  • The Edit menu has the following options:

    • Find: Search for a string.

    • Find & Replace: Search for and replace a string.

    • Separator: The options for adjusting the text editor in use are below this line.

    • Key Map: Set your own function mapping for your keyboard.

    • Default: Checked as it is the default choice. This means to use the default text editor.

    • Sublime text: If you would prefer to use the Sublime editor.

    • Vim: If you would prefer to use Vim.

    • Emacs: If you would prefer to use Emacs.

  • The View menu only has an option to Toggle Line Numbers. I imagine future revisions of the package will have additional features. Similarly, for other file types, the menu may change.

  • The Language menu allows you to specify whether this text file is a specific type of programming file. This allows syntax highlighting, which is a major feature of source editors. The list is extensive:

New folder

The New Folder option creates a new folder with the naming convention Untitled Folder n.

New Python 2

The New Python 2 option creates a new Python 2 session. You are presented with a new browser panel with a similar naming convention, as seen in the following screenshot.

This is a very different presentation, where Python code is expected to be entered in the cells on the page with results displayed below each cell.

There is an extensive menu with File, Edit, View, Insert, Cell, Kernel, and Help options. We have a fairly complete Integrated  Development  Environment (IDE) for creating Python coding:

The File menu has the following options:

  • New Notebook: Start a new notebook (another browser panel like this one)

  • Open...: Select a file to open from the notebook Files view

  • Make a Copy...: Copy the current notebook completely into another browser panel

  • Rename...: Rename the current notebook

  • Save and Checkpoint: Save the current notebook and record a checkpoint

Note

A checkpoint is a point in time where all information about a notebook is preserved. You can have many checkpoints and return the state of your notebook to the previous checkpoint state at any time. This is an excellent way to give yourself the room to try out a new angle on your analysis without risking losing what you have done so far.

  • Revert to Checkpoint: Revert your notebook to a previous checkpoint

  • Print Preview: Present a preview of the printed form of your notebook

  • Download as: Download the notebook in a variety of formats:

    • IPython notebook (its current form)

    • IPython

    • HTML representation

    • Markdown-a specialized display format

    • reST--reStructuredText-an easy to read, plain text markup

    • PDF

    • Presentation

  • Close and  Halt: Close the current notebook and stop any running scripts

The Edit menu has the following options:

  • Cut Cells: Cut the currently selected cells to the clipboard

Note

Each of the rectangular work areas in your notebook is a cell. The innermost text area is where you enter code. Below that (but within the surrounding rectangle), the results of each code stop will be displayed.

  • Copy Cells: Copy cells from the clipboard to the current cursor position

  • Paste Cells Above: Paste cells from the clipboard above the current cell

  • Paste Cells Below: Paste cells from the clipboard below the current cell

  • Paste Cells & Replace: Paste the cells from the clipboard on top of the current cell

  • Delete Cells: Delete the current cells

  • Undo Delete Cells: Revert the last Delete Cells invocation

  • Split Cell: Split up a cell from the current cursor position

  • Merge Cell Above: Merge the current cell with the one above

  • Merge Cell Below: Merge the current cell with the one below

  • Edit Notebook Metadata: Every notebook has underlying metadata that describes the characteristics of the notebook. Advanced users can manipulate this data directly in order to adjust features more readily. For example, the current notebook metadata looks like the following screenshot:

  • Find and Replace: Allow us to find and replace among the selected cells. There is a standardized dialog box for this, as shown in the following screenshot:

As seen in the preceding screenshot, the parameters and their functions are as follows:

  • The Aa icon toggle determines whether a case-insensitive search is made

  • The * icon toggle determines whether a regex search is made

  • The stacked lines icon toggle is whether a replace will be made

  • The Find text block presents the search criteria

  • The Replace text block is used for the replacement text

The View menu has the following options:

  • Toggle Header: Toggles the display of the Jupyter logo and filename

  • Toggle Toolbar: Toggles the display of the toolbar

  • Cell Toolbar: Toggles the display of the cell action icons

The Insert menu has the following options:

  • Insert Cell Above: Add a new cell above the current one

  • Insert Cell Below: Add a new cell below the current one

The Cell menu has the following options:

  • Run Cells: Run the selected (or all) cells.

  • Run Cells and Select Below: Run the current cells down and create a new one below.

  • Run Cells and Insert Below: Run the current cells and create a new one above.

  • Run All: Run all cells.

  • Run All Above: Run all cells prior to the current cell.

  • Run All Below: Run all cells below the current cell.

  • Cell Type: Change the type of cell selected to Code, Markdown, or Raw NBConvert. There is an automatic message that is displayed noting that all cells are by default Code type.

  • Current Outputs and All Output have options to toggle their display.

The Kernel menu has the following options:

  • Interrupt: Send a keyboard interrupt, Ctrl + C, to the kernel. This is useful if your code is in an endless loop.

  • Restart: Restart the kernel.

  • Restart & Clear Output: Restart the kernel and clear all output anew.

  • Restart & Run All: Restart the kernel and run all cells.

  • Reconnect: Connect back to a remote notebook.

  • Change Kernel: Not useful as only Python 2 is available at this point.

The Help menu has the following options:

  • User Interface Tour: Walk the user through a UI tour

  • Keyboard Shortcuts: Presents a list of built-in keyboard shortcuts

  • Notebook Help: Help topics on the notebook

  • Markdown: Description of the markdown available within a notebook

  • Python, IPython, NumPy, SciPy, Matplotlib, SymPy, Pandas: Help topics on the various languages and packages that can be used in notebooks

  • About: A standard about box

There is an icon panel below the menu that has shortcut icons for the following functions:

  • Floppy disk icon: Save and Checkpoint

  • Plus sign: Insert Cell Below

  • Scissors: Cut Cell

  • Duplicate pages: Copy Cell

  • Up arrow: Move Cell Up

  • Down arrow: Move Cell Down

  • An icon that looks like a speaker: Run the current cell

  • Black square: Interrupt Kernel

  • Circular arrow: Restart the Kernel

  • There's a drop-down menu for display characteristics:

    • Code

    • Markdown

    • Raw  NBConvert

    • Heading

  • Keyboard: Open the command palette

  • Change the current toolbar in use. Clicking on the Cell Toolbar button auto-displays the Cell Toolbar choice from the View menu:

 

Security in Jupyter


Jupyter notebooks are created in order to be shared with other users, in many cases over the Internet. However, Jupyter notebooks can execute arbitrary code and generate arbitrary code. This can be a problem if malicious aspects have been placed in a notebook. The default security mechanisms for Jupyter notebooks include the following:

  • Raw HTML is always sanitized (checked for malicious coding). Further information can be found at https://developers.google.com/caja.

  • You cannot run external JavaScript.

  • Cell contents (especially HTML and JavaScript) are not trusted (requires user validation to continue).

  • The output from any cell is not trusted.

  • All other HTML or JavaScript is never trusted. Clearing the output will cause the notebook to become trusted when saved.

Security digest

Notebooks can also use a security digest to ensure the correct user is modifying the contents. A digest takes into account the entire contents of the notebook and a secret (only known by the notebook creator). This combination ensures that malicious coding is not going to be added to a notebook.

You add a security digest to a notebook using the following command:

~/.jupyter/profile_default/security/notebook_secret

Here, you replace the notebook_secret part with your secret.

Trust options

You can specifically apply your trust to a notebook using a command-line option:

jupyter trust /path/to/notebook.ipynb

Or you can do it once the notebook is opened by the File | Trusted  Notebook menu option.

 

Configuration options for Jupyter


You can configure some of the display parameters used when presenting notebooks. These are configurable due to the use of a product (CodeMirror) to present and modify the notebook. CodeMirror is a JavaScript-based editor for use within web pages (notebooks).

The list of configurable options is still in development. Some of the options are as follows:

  • lineSeparator: The character used to separate text lines

  • theme: The overall theme of presentation used in the notebook

  • indentUnit: How many spaces to indent blocks of coding

To change the configuration of one of the options, you open the JavaScript window of your browser, enter the coding to modify an option, and then load your notebook. Then the modifications you made would be applied to the notebook presentation. There is further documentation available at https://codemirror.net/doc/manual.html#option_indentUnit.

For example, to change the indentation (indent-unit) for your notebook, you would use the following JavaScript:

var mycell = Jupyter.notebook.get_selected_cell();
var cell_config = mycell.config;
var code_patch = {
      CodeCell:{
        cm_config:{indentUnit:2}
      }
    }
cell_config.update(code_patch)

You have now seen all of the standard operations available to you in a Jupyter Notebook.

 

Summary


In this chapter, we investigated the various user interface elements available in a notebook. We learned how to install the software on a Mac or a PC. We were exposed to the notebook structure. We saw the typical workflow used when developing a notebook. We walked through the user interface operations available in a notebook. And lastly, we saw some of the configuration options available to advanced users for their notebook.

In the next chapter, we will learn all about Python scripting in a Jupyter Notebook.

About the Author

  • Dan Toomey

    Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.

    Browse publications by this author

Latest Reviews

(4 reviews total)
Only just started but so far I'm happy.
The paper version of the book dos have arrivet yet. Please can you contact me to know when are you going to solve this problem?
A really good introduction to Jupyter.