You're reading from Streamlit for Data Science - Second Edition

Product typeBook

Published inSep 2023

Reading LevelBeginner

PublisherPackt

ISBN-139781803248226

Edition2nd Edition

Languages

Python

Tools

SugarCRM

Concepts

Data Science

Author (1)

Tyler Richards

Machine Learning and AI with Streamlit

A very common situation data scientists find themselves in is at the end of the model creation process, not knowing exactly how to convince non-data scientists that their model is worthwhile. They might have performance metrics from their model or some static visualizations but have no easy way to allow others to interact with their model.

Before Streamlit, there were a couple of other options, the most popular being creating a full-fledged app in Flask or Django or even turning a model into an Application Programming Interface (API) and pointing developers toward it. These are great options but tend to be time-consuming and suboptimal for valuable use cases such as prototyping an app.

The incentives for teams are a little misaligned here. Data scientists want to create the best models for their teams, but if they need to take a day or two (or, if they have experience, a few hours) of work to turn their model into a Flask or Django...

Technical requirements

For this chapter, we will need an OpenAI account. To create one, head over to (https://platform.openai.com/) and follow the instructions on the page.

The standard ML workflow

The first step to creating an app that uses ML is creating the ML model itself. There are dozens of popular workflows for creating your own ML models. It’s likely you might have your own already! There are two parts of this process to consider:

The generation of the ML model
The use of the ML model in production

If the plan is to train a model once and then use this model in our Streamlit app, the best method is to create this model outside of Streamlit first (for example, in a Jupyter notebook or in a standard Python file), and then use this model within the app.

If the plan is to use the user input to train the model inside our app, then we can no longer create the model outside of Streamlit and instead will need to run the model training within the Streamlit app.

We will start by building our ML models outside of Streamlit and move on to training our models inside Streamlit apps.

Predicting penguin species

The dataset that we will primarily use in this chapter is the same Palmer Penguins dataset that we used earlier in Chapter 1, An Introduction to Streamlit. As is typical, we will create a new folder that will house our new Streamlit app and accompanying code.

The following code creates this new folder within our streamlit_apps folder and copies the data from our penguin_app folder. If you haven’t downloaded the Palmer Penguins dataset yet, please follow the instructions in the The setup: Palmer Penguins section in Chapter 2, Uploading, Downloading, and Manipulating Data:

mkdir penguin_ml
cp penguin_app/penguins.csv penguin_ml
cd penguin_ml
touch penguins_ml.py
touch penguins_streamlit.py

As you may have noticed in the preceding code, there are two Python files here, one to create the ML model (penguins_ml.py) and the second to create the Streamlit app (penguins_streamlit.py). We will start with the penguins_ml.py file, and once we have...

Utilizing a pre-trained ML model in Streamlit

Now that we have our model, we want to load it (along with our mapping function as well) into Streamlit. In our file, penguins_streamlit.py, that we created before, we will again use the pickle library to load our files using the following code. We use the same functions as before, but instead of wb, we use the rb parameter, which stands for read bytes. To make sure these are the same Python objects that we used before, we will use the st.write() function that we are so familiar with already to check:

import streamlit as st
import pickle
rf_pickle = open('random_forest_penguin.pickle', 'rb')
map_pickle = open('output_penguin.pickle', 'rb')
rfc = pickle.load(rf_pickle)
unique_penguin_mapping = pickle.load(map_pickle)
st.write(rfc)
st.write(unique_penguin_mapping)

As with our previous Streamlit apps, we run the following code in the terminal to run our app:

streamlit run penguins_streamlit...

Training models inside Streamlit apps

Often, we may want to have the user input change how our model is trained. We may want to accept data from the user or ask the user what features they would like to use, or even allow the user to pick the type of ML algorithm that they would like to use. All of these options are feasible in Streamlit, and in this section, we will cover the basics of using user input to affect the training process. As we discussed in the section above, if a model is going to be trained only once, it is probably best to train the model outside of Streamlit and import the model into Streamlit. But what if, in our example, the penguin researchers have the data stored locally, or do not know how to retrain the model but have the data in the correct format already? In cases like these, we can add the st.file_uploader() option and include a method for these users to input their own data, and get a custom model deployed for them without having to write any code. The following...

Understanding ML results

So far, our app might be useful, but often, just showing a result is not good enough for a data app. We should show some explanation of the results. In order to do this, we can include a section in the output of the app that we have already made that helps users in understanding the model better.

To start, random forest models already have a built-in feature importance method derived from the set of individual decision trees that make up the random forest. We can edit our penguins_ml.py file to graph this importance, and then call that image from within our Streamlit app. We could also graph this directly from within our Streamlit app, but it is more efficient to make this graph once in penguins_ml.py instead of every time our Streamlit app reloads (which is every time a user changes a user input!). The following code edits our penguins_ml.py file and adds the feature importance graph, saving it to our folder. We also call the tight_layout() feature, which...

Integrating external ML libraries – a Hugging Face example

Over the last few years, there has been a massive increase in the number of ML models created by startups and institutions. There is one that, in my opinion, has stood out above the rest for prioritizing the open sourcing and sharing of their models and methods, and that is Hugging Face. Hugging Face makes it incredibly easy to use ML models that some of the best researchers in the field have created for your own use cases, and in this bit, we’ll quickly show off how to integrate Hugging Face into Streamlit.

As part of the original setup for this book, we have already downloaded the two libraries that we need: PyTorch (the most popular deep learning Python framework) and transformers (a Hugging Face’s library that makes it easy to use their pre-trained models). So, for our app, let’s try one of the most basic tasks in natural language processing: Getting the sentiment of a bit of text! Hugging...

Integrating external AI libraries – an OpenAI example

2023 has surely been the year of generative AI, with ChatGPT taking the world and developer community by storm. The availability of generative models behind services like ChatGPT has also exploded, with each of the largest technology companies coming out with their own versions (https://ai.meta.com/llama/ from Meta and https://bard.google.com/ from Google, for example). The most popular series of these generative models is OpenAI’s GPT (Generative Pre-trained Transformer). This section will show you how to use the OpenAI API to add generative AI to your Streamlit apps!

Authenticating with OpenAI

Our first step is to make an OpenAI account and get an API key. To do this, head over to https://platform.openai.com and create an account. Once you have created an account, go to the API keys section (https://platform.openai.com/account/api-keys) and press the button Create new secret key. Once you create the key...

Summary

In this chapter, we learned about some ML basics: How to take a pre-built ML model and use it within Streamlit, how to create our own models from within Streamlit, how to use user input to understand and iterate on ML models, and even how to use models from Hugging Face and OpenAI. Hopefully, by the end of this chapter, you’ll feel comfortable with each of these. Next, we will dive into the world of deploying Streamlit apps using Streamlit Community Cloud!

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://packt.link/sl

The rest of the chapter is locked

You have been reading a chapter from

Streamlit for Data Science - Second Edition

Published in: Sep 2023Publisher: PacktISBN-13: 9781803248226

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Tyler Richards

Tyler Richards is a senior data scientist at Snowflake, working on a variety of Streamlit-related projects. Before this, he worked on integrity as a data scientist for Meta and non-profits like Protect Democracy. While at Facebook, he launched the first version of this book and subsequently started working at Streamlit, which was acquired by Snowflake early in 2022.
Read more about Tyler Richards

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages