Reader small image

You're reading from  Streamlit for Data Science - Second Edition

Product typeBook
Published inSep 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781803248226
Edition2nd Edition
Languages
Concepts
Right arrow
Author (1)
Tyler Richards
Tyler Richards
author image
Tyler Richards

Tyler Richards is a senior data scientist at Snowflake, working on a variety of Streamlit-related projects. Before this, he worked on integrity as a data scientist for Meta and non-profits like Protect Democracy. While at Facebook, he launched the first version of this book and subsequently started working at Streamlit, which was acquired by Snowflake early in 2022.
Read more about Tyler Richards

Right arrow

Machine Learning and AI with Streamlit

A very common situation data scientists find themselves in is at the end of the model creation process, not knowing exactly how to convince non-data scientists that their model is worthwhile. They might have performance metrics from their model or some static visualizations but have no easy way to allow others to interact with their model.

Before Streamlit, there were a couple of other options, the most popular being creating a full-fledged app in Flask or Django or even turning a model into an Application Programming Interface (API) and pointing developers toward it. These are great options but tend to be time-consuming and suboptimal for valuable use cases such as prototyping an app.

The incentives for teams are a little misaligned here. Data scientists want to create the best models for their teams, but if they need to take a day or two (or, if they have experience, a few hours) of work to turn their model into a Flask or Django...

Technical requirements

For this chapter, we will need an OpenAI account. To create one, head over to (https://platform.openai.com/) and follow the instructions on the page.

The standard ML workflow

The first step to creating an app that uses ML is creating the ML model itself. There are dozens of popular workflows for creating your own ML models. It’s likely you might have your own already! There are two parts of this process to consider:

  • The generation of the ML model
  • The use of the ML model in production

If the plan is to train a model once and then use this model in our Streamlit app, the best method is to create this model outside of Streamlit first (for example, in a Jupyter notebook or in a standard Python file), and then use this model within the app.

If the plan is to use the user input to train the model inside our app, then we can no longer create the model outside of Streamlit and instead will need to run the model training within the Streamlit app.

We will start by building our ML models outside of Streamlit and move on to training our models inside Streamlit apps.

Predicting penguin species

The dataset that we will primarily use in this chapter is the same Palmer Penguins dataset that we used earlier in Chapter 1, An Introduction to Streamlit. As is typical, we will create a new folder that will house our new Streamlit app and accompanying code.

The following code creates this new folder within our streamlit_apps folder and copies the data from our penguin_app folder. If you haven’t downloaded the Palmer Penguins dataset yet, please follow the instructions in the The setup: Palmer Penguins section in Chapter 2, Uploading, Downloading, and Manipulating Data:

mkdir penguin_ml
cp penguin_app/penguins.csv penguin_ml
cd penguin_ml
touch penguins_ml.py
touch penguins_streamlit.py

As you may have noticed in the preceding code, there are two Python files here, one to create the ML model (penguins_ml.py) and the second to create the Streamlit app (penguins_streamlit.py). We will start with the penguins_ml.py file, and once we have...

Utilizing a pre-trained ML model in Streamlit

Now that we have our model, we want to load it (along with our mapping function as well) into Streamlit. In our file, penguins_streamlit.py, that we created before, we will again use the pickle library to load our files using the following code. We use the same functions as before, but instead of wb, we use the rb parameter, which stands for read bytes. To make sure these are the same Python objects that we used before, we will use the st.write() function that we are so familiar with already to check:

import streamlit as st
import pickle
rf_pickle = open('random_forest_penguin.pickle', 'rb')
map_pickle = open('output_penguin.pickle', 'rb')
rfc = pickle.load(rf_pickle)
unique_penguin_mapping = pickle.load(map_pickle)
st.write(rfc)
st.write(unique_penguin_mapping)

As with our previous Streamlit apps, we run the following code in the terminal to run our app:

streamlit run penguins_streamlit...

Training models inside Streamlit apps

Often, we may want to have the user input change how our model is trained. We may want to accept data from the user or ask the user what features they would like to use, or even allow the user to pick the type of ML algorithm that they would like to use. All of these options are feasible in Streamlit, and in this section, we will cover the basics of using user input to affect the training process. As we discussed in the section above, if a model is going to be trained only once, it is probably best to train the model outside of Streamlit and import the model into Streamlit. But what if, in our example, the penguin researchers have the data stored locally, or do not know how to retrain the model but have the data in the correct format already? In cases like these, we can add the st.file_uploader() option and include a method for these users to input their own data, and get a custom model deployed for them without having to write any code. The following...

Understanding ML results

So far, our app might be useful, but often, just showing a result is not good enough for a data app. We should show some explanation of the results. In order to do this, we can include a section in the output of the app that we have already made that helps users in understanding the model better.

To start, random forest models already have a built-in feature importance method derived from the set of individual decision trees that make up the random forest. We can edit our penguins_ml.py file to graph this importance, and then call that image from within our Streamlit app. We could also graph this directly from within our Streamlit app, but it is more efficient to make this graph once in penguins_ml.py instead of every time our Streamlit app reloads (which is every time a user changes a user input!). The following code edits our penguins_ml.py file and adds the feature importance graph, saving it to our folder. We also call the tight_layout() feature, which...

Integrating external ML libraries – a Hugging Face example

Over the last few years, there has been a massive increase in the number of ML models created by startups and institutions. There is one that, in my opinion, has stood out above the rest for prioritizing the open sourcing and sharing of their models and methods, and that is Hugging Face. Hugging Face makes it incredibly easy to use ML models that some of the best researchers in the field have created for your own use cases, and in this bit, we’ll quickly show off how to integrate Hugging Face into Streamlit.

As part of the original setup for this book, we have already downloaded the two libraries that we need: PyTorch (the most popular deep learning Python framework) and transformers (a Hugging Face’s library that makes it easy to use their pre-trained models). So, for our app, let’s try one of the most basic tasks in natural language processing: Getting the sentiment of a bit of text! Hugging...

Integrating external AI libraries – an OpenAI example

2023 has surely been the year of generative AI, with ChatGPT taking the world and developer community by storm. The availability of generative models behind services like ChatGPT has also exploded, with each of the largest technology companies coming out with their own versions (https://ai.meta.com/llama/ from Meta and https://bard.google.com/ from Google, for example). The most popular series of these generative models is OpenAI’s GPT (Generative Pre-trained Transformer). This section will show you how to use the OpenAI API to add generative AI to your Streamlit apps!

Authenticating with OpenAI

Our first step is to make an OpenAI account and get an API key. To do this, head over to https://platform.openai.com and create an account. Once you have created an account, go to the API keys section (https://platform.openai.com/account/api-keys) and press the button Create new secret key. Once you create the key...

Summary

In this chapter, we learned about some ML basics: How to take a pre-built ML model and use it within Streamlit, how to create our own models from within Streamlit, how to use user input to understand and iterate on ML models, and even how to use models from Hugging Face and OpenAI. Hopefully, by the end of this chapter, you’ll feel comfortable with each of these. Next, we will dive into the world of deploying Streamlit apps using Streamlit Community Cloud!

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://packt.link/sl

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Streamlit for Data Science - Second Edition
Published in: Sep 2023Publisher: PacktISBN-13: 9781803248226
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Tyler Richards

Tyler Richards is a senior data scientist at Snowflake, working on a variety of Streamlit-related projects. Before this, he worked on integrity as a data scientist for Meta and non-profits like Protect Democracy. While at Facebook, he launched the first version of this book and subsequently started working at Streamlit, which was acquired by Snowflake early in 2022.
Read more about Tyler Richards