Learning Jupyter 5 - Second Edition

By Dan Toomey
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    Introduction to Jupyter
About this book

The Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, and machine learning. Learning Jupyter 5 will help you get to grips with interactive computing using real-world examples.

The book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next, you will learn to integrate the Jupyter system with different programming languages such as R, Python, Java, JavaScript, and Julia, and explore various versions and packages that are compatible with the Notebook system. Moving ahead, you will master interactive widgets and namespaces and work with Jupyter in a multi-user mode.

By the end of this book, you will have used Jupyter with a big dataset and be able to apply all the functionalities you’ve explored throughout the book. You will also have learned all about the Jupyter Notebook and be able to start performing data transformation, numerical simulation, and data visualization.

Publication date:
August 2018
Publisher
Packt
Pages
282
ISBN
9781789137408

 

Chapter 1. Introduction to Jupyter

Jupyter is a tool that allows data scientists to record their complete analysis process, much in the same way other scientists use a Lab Notebook to record tests, progress, results, and conclusions.

The Jupyter product was originally developed as part of the IPython project. The IPython project was used to provide interactive online access to Python. Over time, it became useful to interact with other data analysis tools, such as R, in the same manner. With this split from Python, the tool grew into its current manifestation of Jupyter. IPython is still an active tool that's available for use. The name Jupyter itself is derived from the combination of Julia, Python, and R.

Jupyter is available as a web application from a number of places. It can also be used locally over a wide variety of installations. In this book, we will be exploring using Jupyter on a macOS and a Windows PC, as well as over the internet with other providers.

With Jupyter 5.0, there were significant enhancements for the following:

  • Cell tagging
  • Customizing keyboard shortcuts
  • Copying and pasting cells between Notebooks
  • A more attractive default style for tables

In this chapter, we will cover the following topics:

  • First look at Jupyter
  • Installing Jupyter
  • Notebook structure
  • Notebook workflow
  • Basic Notebook operations
  • Security in Jupyter
  • Configuration options for Jupyter
 

First look at Jupyter


Here is a sample opening page when using Jupyter (this screenshot is on a Windows machine):

You should get yourself acquainted with the environment. The Jupyter user interface has a number of components:

  • The product title, Jupyter, in the top left (as expected). The logo and the title name are clickable and will return you to the Jupyter Notebook home page.
  • There are three tabs which are displayed: Files, Running, and Clusters:
  • The Files tab shows the list of files in the current directory of the page (described later on in this section).
  • The Running tab presents another screen, which shows the currently running processes and Notebooks. The drop-down lists for Terminals and Notebooks are populated with their running members:
  • The Clusters tab presents another screen which displays a list of available clusters. This topic is covered in a later chapter:
  • In the top right corner of the screen, are three buttons: Upload, New (menu), and a Refresh notebook list button.
  • The Upload button is used to add files to the Notebook space. You may also just drag and drop as you would when handling files. Similarly, you can drag and drop Notebooks into specific folders as well.
  • The menu with New at the top presents a further menu of the Notebook for the different Notebook engines that have been installed (I had installed Jupyter earlier these are not default values) Javascript (Node.js), Julia 0.6.1, Python 2 (which will not be covered in this book), and Python 3. The additional Other menu items are Text FileFolder, and Terminal:
  • The Text File option is used to add a text file to the current directory. Jupyter will open a new browser window for you, running a text editor. The text entered is automatically saved and will be displayed in your Notebook files and directory display:

Note

The default filename, untitled1.txt, is editable. Note that the filename corresponds with the title given to the Notebook.

  • The Folder option creates a new folder with the nameUntitled Folder. Remember that all of the file and folder names are editable:
  • The Terminals option is used to open a new Terminal (command) window. The resulting display on a Windows machine looks as follows:
  • The Python 3 option is used to start a new Python 3 Notebook. The interface looks like it does in the following screenshot. You have full file editing capabilities for your script, including saving as a new file. You also have a complete working IDE for your Python script:

Note

Note, like the Text File and Folder option, you have created a Python script file in your Notebook and it is running! (You can see this in the home page display of Jupyter):

  • The Refresh notebook listbutton is used to update the display. It's not really necessary as the display is reactive to any changes in the underlying file structure.
  • At the top of theFilestab is a checkbox, a drop-down menu, and aHomebutton.
  • The checkbox is used to toggle all the checkboxes in the items list.
  • The drop-down menu presents a list of the choices available, that is, Folders, All NotebooksRunning, and Files, as shown in the following screenshot:
  • The Folders selection will select all the folders in the display and present a count of the folders in the small box.
  • The All Notebooks selection will change the count to the number of Notebooks and provide you with five options:
    • Duplicate (the selected Notebooks)
    • Shutdown (the selected Notebooks)
    • View (the selected Notebooks)
    • Edit (the selected Notebooks)
    • Delete (the trash can icon; the selected Notebooks)
  • You can see them in the following screenshot:
  • The Running selection will select any running scripts in the display and update the count to the number selected:
  • The Files selection will select all of the files in the Notebook display and update the count accordingly.
  • The Home button brings you back to the home screen of the Notebook.
  • On the left-hand side of every item is a checkbox, an icon, and the item's name:
  • The checkbox is used to build a set of files to operate upon.
  • The icon is indicative of the type of item. In this case, all of the items are folders.
  • The name of the item corresponds to the name of the object. In this case, the filenames are as they are when used on the disk.
 

Installing Jupyter


Jupyter requires Python to be installed (it is based on the Python language, after all). There are a couple of tools that will automate the installation of Jupyter (and optionally Python) from a GUI. In this case, we are showing you how to install Jupyter using Anaconda, which is a Python tool for distributing software.

First of all, you have to install Anaconda. It is available on Windows and macOS environments. Download the executable from https://www.continuum.io/ (the company that produces Anaconda) and run it to install Anaconda. Be sure to select the version of Anaconda using Python 3.x versus Python 2.x. The software provides a regular installation setup process, as shown in the following screenshot:

The installation process goes through the regular steps of making you agree to the distribution rights license:

The standard Windows installation allows you to decide whether all users on the machine can run the new software or not. If you are sharing a machine with different levels of users, then you can decide upon the appropriate action:

After clicking on Next, it will ask for a destination for the software to reside (I almost always keep the default paths):

 

Anaconda will also adjust your file paths to make Anaconda accessible at all points on your machine via the next dialog box, as follows:

 

The installation will then begin. This may take a while, depending on your machine configuration and network access:

 

You will eventually get to the Installation Complete screen, as follows:

 

On Windows, Anaconda takes advantage of the semi built-in aspects of the Visual Development Environment to access Windows services natively. It asks for permission to do so with the following dialog:

 

And now we have truly installed Jupyter:

Anaconda will start. Anaconda is a great wrapper program that holds the distribution for a number of tools. The tool of importance to us is Jupyter. The Anaconda display shows the available tools, whether they need to be installed, and a starting place for each.

You can get to Jupyter directly by using the > jupyter notebook command from a Terminal window.

 

 

If we select Jupyter from the Anaconda screen, we will start Jupyter in a new browser window:

 

When Jupyter is running, we can get some details on the installation by using the File | About menu, which will provide a dialog box like this one, which is showing some details on the Jupyter installation, as follows:

If you start Jupyter from the command line directly, Jupyter will open in a new browser window and you will see some of the logging entries that will display on your Terminal window, noting the progress being made in your use:

 

Note that the last line of the log is the instruction you must use to stop the server (pressCtrl + Cin the command-line window where the server is running).

If you pressCtrl + Cin that window, the Jupyter server will shut down gracefully:

[W 17:26:36.688 NotebookApp] 404 GET /favicon.ico (::1) 62.00ms referer=None[W 17:26:36.750 NotebookApp] 404 GET /favicon.ico (::1) 0.00ms referer=None[I 17:28:24.891 NotebookApp] Interrupted...[I 17:28:24.891 NotebookApp] Shutting down kernels

You will notice that the Anaconda package has been installed on your application menu for further use:

 

 

 

Notebook structure


A Jupyter Notebook is fundamentally a JSON file with a number of annotations. The main parts of the Notebook are as follows:

  • Metadata: A data dictionary of definitions used to set up and display the Notebook
  • Notebookformat: Version numbers of the software used to create the Notebook (the version number is used for backward compatibility)
  • Listofcells: There are different types of cells for markdown (display), code (to execute), and output (of the code type cells)
 

Notebook workflow


The typical workflow is as follows:

  • Create a new Notebook for a project or data analysis.
  • Add your analysis steps, coding, and output.
  • Surround your analysis with organizational and presentational markdown to communicate an entire story.
  • Interactive Notebooks (that include widgets and display modules) would then be used by others by modifying parameters and data to note the effects of their changes. Your markdown would present the cases that a user may want to investigate, and probable results.
 

Basic Notebook operations


In this section, we will describe the different operations that you can perform on your Jupyter Notebook. Most of the operations are menu functions that will change your display accordingly.

File operations

Let's walk through the basic file operations.

From the Files tab, we can see a list of files and folders in the current Notebook/disk folder. If we select (check) one of the files, we will see the top-left menu change:

We now have choices of Duplicate, Rename, and delete (the trash icon). Note that the number of files selected, 1, is displayed in the box as well.

Duplicate

If we hit the Duplicate button, we get a confirmation prompt with the name of the file that's been selected for duplication:

Cancelwill close the dialog.Duplicatewill create another copy of the file with an appended copy number, as shown in the following screenshot. The original filename has been used with the addition of-Copynin the filename, wherenis the copy number. Note the original file extension,.py, has been maintained in the new file:

 

Rename

Similarly, if we hit the Rename button, another dialog box will appear to prompt the new filename to apply. The main filename has been highlighted as it assumes you want to maintain the file extension as the file type has not changed:

Delete

We can also delete the file by clicking on the trashcan icon. This brings up a confirmation dialog box as follows. I like that they changed the background of Delete to red to make sure that you don't just happily click it:

At the top right of the screen, we have options for Upload and New.

 

Upload

The Upload button is more meaningful when the Notebook is stored on a web server. When running it on your desktop, it allows you to move files easily from one part of your Notebook to another. If you click this button, you are presented with a file selector dialog box. The following screenshot is specific to a Windows environment, but a similar display is presented on macOS. Once you select a file, it will be added to your Notebook space:

 

New text file

If we opt to createText File, we are presented with a new browser panel in the Jupyter text editor (I have shrunk down the size of the screen so that the display fits the boundaries of this book):

There are several points of interest on this screen:

  • We are in a new browser panel (the Notebook display is still present in the Other tab).
  • The name of the new file is untitled1.txt. Using the same convention as duplication, the new filename starts with untitled.txt and is incremented as needed.
  • Curiously, it mentions when the file was created.
  • In the top-right corner, we see Plain Text. So, we might expect to see some other description here for other file types.
  • We have a new menu, which includes File, Edit, View, and Language.
  • TheFilemenu has the following options:
    • New: Starts another new text window.
    • Save: Save or updates the current text file into the Notebook area.
    • Rename: Changes the name of the file (unlikely, as you would want to keep theuntitlednname that's provided).
    • Download: Again, an option that makes more sense if your Notebook is running on the web. As explained for upload, downloadon a desktop installation allows you to copy a file to another part of your machine.

 

  • The Edit menu has the following options:
    • Find: Searches for a string.
    • Find & Replace: Searches and replaces a string.
    • Separator: Below this line is adjusting the text editor in use.
    • Key Map: Set your own function mapping for your keyboard.
    • Default: Checked as it is the default choice. This means using the default text editor.
    • Sublime Text: If you would prefer to use the Sublime editor.
    • Vim: If you would prefer to use VIM.
    • emacs: If you would prefer to use emacs.
  • The View menu only has an option to Toggle Line Numbers. I imagine future revisions of the package will have additional features. Similarly, for other file types, the menu may change.
  • The Language menu allows you to specify whether this text file is a specific type of programming file. This allows for syntax highlighting, which is a major feature of source editors. The list is extensive:

New folder

The Folder option creates a new folder with the naming convention untitledfolder.

New Python 3

The new Python 3 option creates a new Python 3 Notebook. You are presented with a new browser panel with a similar naming convention, as shown in the following screenshot.

This is a very different presentation, where Python code is expected to be entered in the cells on the page with results displayed in each cell.

There is an extensive menu with File, Edit, View, Insert, Cell, Kernel, and Help options. We have a fairly complete IntegratedDevelopmentEnvironment (IDE) for creating Python coding:

The File menu has the following options:

  • New Notebook: Starts a new Notebook (another browser panel like this one)
  • Open...: Selects a file to open from the Notebook files view
  • Make a Copy...: Copies the current Notebook completely into another browser panel
  • Rename...: Renames the current Notebook
  • Save and Checkpoint: Saves the current Notebook and records a checkpoint

Note

A checkpoint is a point in time where all information about a Notebook is preserved. You can have many checkpoints and return the state of your Notebook to the previous checkpoint state at any time. This is an excellent way to give yourself the room to try out a new angle on your analysis without risking losing what you have done so far.

  • Revert to Checkpoint: Reverts your Notebook to a previous checkpoint
  • Print Preview: Presents a preview of the printed form of your Notebook
  • Download as: Downloads the Notebook in a variety of formats:
    • IPython Notebook (its current form)
    • IPython
    • HTML representation
    • Markdown a specialized display format
    • REST – Restructured Text, which is an easy to read, plain text markup
    • PDF
    • Presentation
  • Close andHalt: Closes the current Notebook and stops any running scripts

Note

Each of the rectangular work areas in your Notebook is a cell. The innermost text area is where you enter code. Below that (but within the surrounding rectangle), the results of each code stop will be displayed.

  • The Edit menu has the following options:
    • Copy Cells: Copies cells from the clipboard to the current cursor position.
    • Paste Cells Above: Pastes cells from the clipboard above the current cell.
    • Paste Cells Below: Pastes cells from the clipboard below the current cell.
    • Paste Cells & Replace: Pastes the cells from the clipboard on top of the current cell.
    • Delete Cells: Deletes the current cells.
    • Undo Delete Cells: Reverts the last delete cells invocation.
    • Split Cell: Splits up a cell from the current cursor position.
    • Merge Cell Above: Merges the current cell with the one above.
    • Merge Cell Below: Merges the current cell with the one below.
    • Edit Notebook Metadata: Every Notebook has underlying metadata which describes the characteristics of the Notebook. Advanced users can manipulate this data directly in order to adjust features more readily. For example, the current Notebook metadata looks like the following screenshot:
  • Find and Replace: Allows for find and replace among the selected cells. There is a standardized dialog box for this, as shown in the following screenshot:
  • As seen in the preceding screenshot, the parameters and their functions are as follows:
    • The Aa icon toggle determines whether a case-insensitive search is made
    • The * icon toggle determines whether a regex search is made
    • The stacked lines icon toggle determines whether a replace will be made
    • The Find text block presents the search criteria
    • The Replace text block is used for the replacement text
  • The View menu has the following options:
    • Toggle Header: Toggles the display of the Jupyter logo and filename
    • Toggle Toolbar: Toggles the display of the toolbar
    • Cell Toolbar: Toggles the display of the cell action icons
  • The Insert menu has the following options:
    • Insert Cell Above: Adds a new cell above the current one
    • Insert Cell Below: Adds a new cell below the current one
  • The Cell menu has the following options:
    • Run Cells: Runs the selected (or all) cells.
    • Run Cells and Select Below: Runs the current cells down and creates a new one below.
    • Run Cells and Insert Below: Runs the current cells and creates a new one above.
    • Run All: Runs all cells.
    • Run All Above: Runs all cells prior to the current cell.
    • Run All Below: Runs all cells below the current cell.
    • Cell Type: Changes the type of cell selected to Code, Markdown, or NBConvert. There is an automatic message that is displayed, noting that all cells are, by default, Code type.
    • Current Outputs and All Output have options to toggle their display.
  • The Kernel menu has the following options:
    • Interrupt: Send a keyboard interrupt, Ctrl + C, to the kernel. This is useful if your code is in an endless loop.
    • Restart: Restart the kernel.
    • Restart & Clear Output: Restart the kernel and clear all output anew.
    • Restart & Run All: Restart the kernel and run all cells.
    • Reconnect: Connect back to a remote Notebook.
    • Change kernel: Not useful as only Python 2 is available at this point.
  • The Help menu has the following options:
    • User Interface Tour: Walks the user through a UI tour
    • Keyboard Shortcuts: Presents a list of built-in keyboard shortcuts
    • Notebook Help: Presents help topics on the Notebook
    • Markdown: Description of the markdown available within a Notebook
    • Python Reference, IPython Reference, NumPy Reference, SciPy Reference, Matplotlib Reference, SymPy Reference, Pandas Reference: Help topics on the various languages and packages that can be used in Notebooks
    • About: A standard about box

There is an icon panel below the menu that has shortcut icons for the preceding functions:

  • Floppy disk icon: Save and Checkpoint.
  • Plus sign: Insert cell below.
  • Scissors: Cut selected cells.
  • Duplicate pages: Copy selected cells.
  • Up arrow: Move selected cells up.
  • Down arrow: Move selected cells down.
  • An icon that looks like a speaker: Run the current cell.
  • Black square: Interrupt the kernel.
  • Circular arrow: Restart the kernel (with dialog).
  • A drop-down menu for display characteristics:
    • Code
    • Markdown
    • RawNBConvert
    • Heading
  • Keyboard: Open the command palette.
  • Change the current toolbar in use. Clicking on the Cell Toolbar button auto-displays the Cell Toolbar choice from the View menu.
 

Security in Jupyter


Jupyter Notebooks are created in order to be shared with other users, in many cases over the internet. However, Jupyter Notebooks can execute arbitrary code and generate arbitrary code. This can be a problem if malicious aspects have been placed in a Notebook. The default security mechanisms for Jupyter Notebooks include the following:

  • Raw HTML is always sanitized (checked for malicious coding). Further information can be found at https://developers.google.com/caja.
  • You cannot run external JavaScript.
  • Cell contents (especially HTML and JavaScript) is not trusted (requires user validation to continue).
  • The output from any cell is not trusted.
  • All other HTML or JavaScript is never trusted, and clearing the output will cause the Notebook to become trusted when saved.

Security digest

Notebooks can also use a security digest to ensure the correct user is modifying the contents. A digest takes into account the entire contents of the Notebook and a secret (only known by the Notebook creator). This combination ensures that malicious coding is not going to be added to a Notebook.

You can add a security digest to a Notebook by using the following command:

~/.jupyter/profile_default/security/notebook_secret

 

Here, you replace the notebook_secret part with your secret.

Trust options

You can specifically apply your trust to a Notebook by using the following command-line option:

jupyter trust /path/to/notebook.ipynb

Or you can do it once the Notebook is opened by the FileTrustedNotebook menu option.

 

Configuration options for Jupyter


You can configure some of the display parameters that are used when presenting Notebooks. These are configurable due to the use of a product (CodeMirror) to present and modify the Notebook. CodeMirror is a JavaScript-based editor for use within web pages (Notebooks).

The list of configurable options is still in development. Some of the options are as follows:

  • Line-separator: The character used to separate text lines
  • Theme: The overall theme of presentation used in the Notebook
  • Indent-unit: How many spaces to indent blocks of coding

To change the configuration of one of the options, you can open the JavaScript window of your browser, enter the coding to modify an option, and then load your Notebook. Then, the modifications you make will be applied to the Notebook presentation. There is further documentation on this, which is available at https://codemirror.net/doc/manual.html#option_indentUnit.

For example, to change the indentation (Indent-unit) for your Notebook, you would use the following JavaScript:

var mycell = Jupyter.notebook.get_selected_cell(); 
var cell_config = mycell.config; 
var code_patch = { 
      CodeCell:{ 
        cm_config:{indentUnit:2} 
      } 
    } 
cell_config.update(code_patch) 

You have now seen all of the standard operations that are available to you in a Jupyter Notebook.

 

Summary


In this chapter, we investigated the various user interface elements that are available in a Notebook. We learned how to install the software on a macOS or a Microsoft PC. We were exposed to the Notebook structure. We saw the typical workflow that's used when developing a Notebook, and we walked through the user interface operations that are available in a Notebook. Lastly, we saw some of the configuration options that are available to advanced users for their Notebook.

In the next chapter, we will learn all about Python scripting in a Jupyter Notebook.

About the Author
  • Dan Toomey

    Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.

    Browse publications by this author
Learning Jupyter 5 - Second Edition
Unlock this book and the full library FREE for 7 days
Start now