You're reading from Streamlit for Data Science - Second Edition

Product type Book

Published in Sep 2023

Publisher Packt

ISBN-13 9781803248226

Pages 300 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Science

Author (1):

Tyler Richards

Table of Contents (15) Chapters

Preface

An Introduction to Streamlit

Uploading, Downloading, and Manipulating Data

Data Visualization

Machine Learning and AI with Streamlit

Deploying Streamlit with Streamlit Community Cloud

Beautifying Streamlit Apps

Exploring Streamlit Components

Deploying Streamlit Apps with Hugging Face and Heroku

Connecting to Databases

Improving Job Applications with Streamlit

The Data Project – Prototyping Projects in Streamlit

Streamlit Power Users

Other Books You May Enjoy

Index

The Data Project – Prototyping Projects in Streamlit

In the previous chapter, we discussed how to create Streamlit applications that are specific to job applications. Another fun application of Streamlit is to try out new and interesting data science ideas and create interactive apps for others. Some examples of this include applying a new machine learning model to an existing dataset, carrying out an analysis of some data uploaded by users, or creating an interactive analysis on a private dataset. There are numerous reasons for making a project like this, such as personal education or community contribution.

In terms of personal education, often, the best way to learn about a new topic is to observe how it actually works by applying it to the world around you or a dataset that you know closely. For instance, if you try to learn how Principal Component Analysis works, you can always learn about it in a textbook or watch someone else apply it to a dataset. However, I have...

Technical requirements

In this section, we will utilize the website Goodreads.com, which is a popular website owned by Amazon that is used to track everything about a user’s reading habits, from when they started and finished books to what they would like to read next. It is recommended that you first head over to https://www.goodreads.com/, sign up for an account, and explore a little (perhaps you can even add your own book lists!).

Data science ideation

Often, coming up with a new idea for a data science project is the most daunting part. You might have numerous doubts. What if I start a project that no one likes? What if my data actually doesn’t work out well? What if I can’t think of anything? The good news is that if you create projects that you actually do care about and would use, then the worst-case scenario is that you have an audience of one! And if you send me (tylerjrichards@gmail.com) your project, I promise to read it. So that makes it an audience of two at the very least.

Some examples I have either created or observed in the wild include the following:

Recording ping-pong games for a semester to determine the best player with an Elo model (http://www.tylerjrichards.com/Ping_pong.html or https://www.youtube.com/watch?v=uPg7PEdx7WA)
Using Large Language Models to chat with your organization’s Snowflake data (https://snowchat.streamlit.app/)
Analyzing...

Collecting and cleaning data

There are two ways in which to get data from Goodreads: through its Application Programming Interface (API), which allows developers to programmatically access data about books, and through its manual exporting function. Sadly, Goodreads is deprecating its API in the near future and, as of December 2020, does not give access to new developers.

The original Goodreads app uses the API, but our version will rely on the manual exporting function that the Goodreads website has instead. To get your data, head over to https://www.goodreads.com/review/import and download your own data. If you do not have a Goodreads account, feel free to use my personal data for this, which can be found at https://github.com/tylerjrichards/goodreads_book_demo. I have saved my Goodreads data in a file, called goodreads_history.csv, in a new folder, called streamlit_goodreads_book. To make your own folder with the appropriate setup, run the following in your terminal:

mkdir...

Making an MVP

Looking at our data, we can start by asking a basic question: what are the most interesting questions I can answer with this data? After looking at the data and thinking about what information I would want from my Goodreads reading history, here are a few questions that I have thought of:

How many books do I read each year?
How long does it take for me to finish a book that I have started?
How long are the books that I have read?
How old are the books that I have read?
How do I rate books compared to other Goodreads users?

We can take these questions, figure out how to modify our data to visualize them well, and then make the first attempt at creating our product by printing out all of the graphs.

How many books do I read each year?

For the first question about books read per year, we have the Date Read column with the data presented in the format of yyyy/mm/dd. The following code block will do the following:

...

Iterative improvement

So far, we have been almost purely in production mode with this app. Iterative improvement is all about editing the work we have already done and organizing it in a way that makes the app more usable and, frankly, nicer to look at. There are a few improvements that we can shoot for here:

Beautification via animation
Organization using columns and width
Narrative building through text and additional statistics

Let’s start by using animations to make our apps a bit prettier!

Beautification via animation

In Chapter 7, Exploring Streamlit Components, we explored the use of various Streamlit Components; one of these was a component called streamlit-lottie, which gives us the ability to add animation to our Streamlit applications. We can improve our current app by adding an animation to the top of our current Streamlit app using the following code. If you want to learn more about Streamlit Components, please head back...

Hosting and promotion

Our final step is to host this app on Streamlit Community Cloud. To do this, we need to perform the following steps:

Create a GitHub repository for this work.
Add a requirements.txt file.
Use one-click deployment on Streamlit Community Cloud to deploy the app.

We have already covered this extensively in Chapter 5, Deploying Streamlit with Streamlit Community Cloud, so give it a shot now without instructions.

Summary

What a fun chapter! We have learned so much here – from how to come up with data science projects of our own, and how to create initial MVPs, to the iterative improvement of our apps. We did this all through the lens of our Goodreads dataset, and we took this app from just an idea to a fully functioning app hosted on Streamlit Community Cloud. I look forward to seeing all the different types of Streamlit apps that you create. Please create something fun and send it to me on Twitter at @tylerjrichards. In the next chapter, we will focus on interviews with Streamlit power users and creators to learn tips and tricks, why they use Streamlit so extensively, and also where they think the library will go from here. See you there!

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://packt.link/sl