Packt+ | Advance your knowledge in tech

You're reading from Jupyter for Data Science

Product typeBook

Published inOct 2017

Reading LevelBeginner

PublisherPackt

ISBN-139781785880070

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1)

Dan Toomey

Chapter 10. Optimizing Jupyter Notebooks

Before a Jupyter Notebook is developed you should confront optimizations that should occur before the public starts their access. Optimizations cover a gamut of options running from language-specific issues (use best practice R coding style) to deploying your notebook in a highly available environment.

Deploying notebooks

A Jupyter Notebook is a website. You could host a website on the computer that you are using to display this document. There may be a machine available in your department that is in use as a web server.

If you were to deploy on a local machine you would have a single user website where additional users would be blocked from access or would collide with each other. The first step towards publishing your notebook involves using a hosting service that provides multiple user access.

Deploying to JupyterHub

The predominant Jupyter hosting product currently is JupyterHub. To be clear, JupyterHub is installed into a machine under your control. It provides multi-user access to your notebooks. This means you could install JupyterHub on a machine in your environment and only internal users (multiple internal users) could access it.

When JupyterHub starts it begins a hub or controlling agent. The hub will start an instance of a listener or proxy for Jupyter requests. When the proxy...

Optimizing your script

There are optimizations that you can make to have your notebook scripts run more efficiently. The optimizations are script language dependent. We have covered using Python and R scripts in our notebooks and will cover optimizations that can be made for those two languages.

Jupyter does support additional languages, such as Scala and Spark. The other languages have their own optimization tools and strategies.

Optimizing your Python scripts

Performance tuning your Python scripts can be done using several tools:

timeit
Python regular expressions
String handling
Loop optimizations
hotshot profiling

Determining how long a script takes

The timeit function in Python takes a line of code and determines how long it takes to execute. You can also repeatedly execute the same script to see if there are start-up issues that need to be addressed.

timeit is used in this manner:

import timeitt = timeit.Timer("myfunction('Hello World')", "import myfunction")   t.timeit()              3.32132323232...

Monitoring Jupyter

As with the earlier discussions in this chapter on optimization, you can also use programming tools to monitor the overall interactions of your notebook. The predominant tool for Linux/Mac environments is memory_profiler. If you start this tool then your notebook, the profiler will keep track of memory use of your notebook.

With this record of information points you may be able to adjust your programmatic memory allocation to be smaller in profile if you find a large memory use occurring. For example, the profiler may highlight that you are creating (and dropping) a large memory item continuously inside of a loop. When you go back to your coding you realize this memory access could be pulled out of the loop and just done once or that size of the allocation could be minimized easily.

Caching your notebook

Caching is a common programming practice to speed up performance. If the computer does not have to reload a section of code or variable or file, but can just access directly from a cache this will improve performance.

There is a mechanism to cache your notebook if you are deploying into a Docker space. Docker is a mechanism for virtualizing code over many instances in one machine. It has become common practice to do so in the Java programming world. Luckily, Docker is very flexible and a method has been determined to use Jupyter in Docker as well. Once in Docker, it is a minor adjustment to automatically cache your pages in Docker. The underlying tool used is memcached, yet another widespread common tool for caching anything, in this case Jupyter Notebooks.

Securing a notebook

Securing a notebook can be accomplished by several methods such as:

Manage authorization
Securing notebook content

Managing notebook authorization

A notebook can be secured to use username/password authorization. Authorization is on by default in your notebook. Under Jupyter it is token/password instead of username/password as a token is more open to interpretation. See Jupyter documentation on implementing authorization as this has changed slightly over time.

Securing notebook content

A notebook has possible security issues with several parts of standard content that are secured automatically by Jupyter:

Untrusted HTML is sanitized
Untrusted JavaScript is not executed
HTML and JavaScript in markdown cells is not trusted
Notebook output is not trusted
Other HTML or JavaScript in the notebook is not trusted

Where trust comes down to the question: Did the user do this or did the Jupyter script? Untrusted means it will not be generated.

Sanitized code is wrapped to force the values to...

Scaling Jupyter Notebooks

Scaling is the process of providing very large numbers of concurrent users to a notebook without a degradation in performance. The one vendor that is doing this today is Azure. They have thousands of pages and users working at scale daily.

Most amazingly this is a free service.

Converting a notebook

You can also share a notebook with others by converting the notebook to a readable form for recipients. Notebooks can be converted to a number of formats using the Download As feature in the notebook File menu.

Notebooks can be converted in this way to the formats:

<language> format: This option is dependent on the language used to create the notebook. For example, an R notebook would have the choice to Download as R script.
HTML: This representation is the HTML encoding to display the page as it appears in your notebook using HTML constructs.
Markdown: Markdown is a simple display tag format used by some older Linux systems.
reST: Another markdown type of format that has simpler display constructs than HTML.
PDF.

Versioning a notebook

A common practice in the programming world is to maintain a history of the changes made to a program. Over time the different versions of the program are maintained in a software repository where the programmer can retrieve prior versions to return to an older, working state of their program.

In the previous section we mentioned placing your notebook on GitHub. Git is a software repository in wide use. GitHub is an internet-based instance of Git. Once you have any software in Git it will automatically be versioned. The next time you update your notebook in GitHub. Git will take the current instance, store it as a version in your history, and place the new instance as the current—where anyone accessing your GitHub repository will see the latest version by default.

Summary

In this chapter, we deployed our notebook to a set of different environments. We looked into optimizations that can be made to our notebook scripts. We learned about different ways to share our notebook. Lastly, we looked into converting our notebook for users without access to Jupyter.

The rest of the chapter is locked

You have been reading a chapter from

Jupyter for Data Science

Published in: Oct 2017Publisher: PacktISBN-13: 9781785880070

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Other recommended products

Related to this chapter

Jupyter Cookbook

Jupyter has garnered a strong interest in the data science community of late, as it makes common data processing and analysis tasks much simpler. This book is for data science professionals who want to master various tasks related to Jupyter to create efficient, easy-to-share applications related to data analysis and visualization.

BookApr 2018238 pages

Learning Jupyter 5

In this book, you will learn how to build interactive dashboards in a Jupyter notebook. Explore JupyterHub and various Jupyter widgets through which you can easily perform 3D data visualization, 3D plotting, and geospatial analytics. This book helps you understand BeakerX to create interactive tables and interact with spreadsheets.

BookAug 2018282 pages

JupyterLab Quick Start Guide

Jupyterlab is a web-based data science interface and natural evolution of Jupyter Notebooks. This guide will take you through the core commands and functionalities of JupyterLab. You will learn to customize and enhance your JupyterLab productivity by installing additional extensions.

BookDec 2019160 pages

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

BookMay 2019266 pages

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

Practical Data Science Cookbook

As an increasing amount of data is generated each year, and the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data have a competitive advantage over companies that don't. This drives a higher demand for knowledgeable and competent data professionals. By sequentially working through the steps presented in each chapter, you will quickly familiarize yourself with the data science process, and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis - R and Python.

BookJun 2017434 pages

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

BookSep 2017560 pages

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Jupyter for Data Science

Chapter 10. Optimizing Jupyter Notebooks

Deploying notebooks

Deploying to JupyterHub

Optimizing your script

Optimizing your Python scripts

Determining how long a script takes

Monitoring Jupyter

Caching your notebook

Securing a notebook

Managing notebook authorization

Securing notebook content

Scaling Jupyter Notebooks

Sharing Jupyter Notebooks

Sharing Jupyter Notebook on a notebook server

Note

Sharing encrypted Jupyter Notebook on a notebook server

Converting a notebook

Versioning a notebook

Summary

Unlock this book and the full library FREE for 7 days

Author (1)

Jupyter Cookbook

Learning Jupyter 5

JupyterLab Quick Start Guide

Jupyterlab is a web-based data science interface and natural evolution of Jupyter Notebooks. This guide will take you through the core commands and functionalities of JupyterLab. You will learn to customize and enhance your JupyterLab productivity by installing additional extensions.

Hands-On Exploratory Data Analysis with R

Regression Analysis with R

Practical Data Science Cookbook

R Data Analysis Cookbook

Advanced Analytics with R and Tableau

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook