Home Data The Pandas Workshop

The Pandas Workshop

By Blaine Bateman , Saikat Basak , Thomas V. Joseph and 1 more
books-svg-icon Book
Subscription FREE
eBook + Subscription $12.99
eBook $41.99
Print + eBook $51.99
READ FOR FREE Free Trial for 7 days. $12.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $12.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription $12.99
eBook $41.99
Print + eBook $51.99
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 1: Introduction to pandas
About this book
The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects. You’ll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you to write clean code and build your understanding through hands-on practice. As you work through this Python pandas book, you’ll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services. By the end of this data analytics book, you’ll have the knowledge, skills, and confidence you need to solve your own challenging data science problems with pandas.
Publication date:
June 2022
Publisher
Packt
Pages
744
ISBN
9781800208933

 

Chapter 1: Introduction to pandas

From creating basic DataFrames to optimizing your code, this chapter will serve as a crash course that will help you realize, through code examples and practical exercises, the prowess of pandas in data wrangling and analytics. By the end of this chapter, you will have gained rudimentary experience in reading and writing data, performing DataFrame aggregation, working with time series, preprocessing before modeling, data visualization, and more. You will also be able to create datasets and perform preprocessing tasks on them. You will implement most of the core features of the library, which we will cover in more depth in the chapters that follow. The final chapter of this book will help you consolidate what you've learned through a series of activities.

In this chapter, we will cover the following topics:

  • Introduction to the world of pandas
  • Exploring the history and evolution of pandas
  • Components and applications of pandas
  • Understanding the basic concepts of pandas
  • Activity 1.01 – comparing sales data for two stores
 

Introduction to the world of pandas

Tess's latest project has turned out to be much more time-consuming than she initially anticipated. Her client, who develops and provides content for schools, wants her to find insights into their students' needs by analyzing data that's been collected through various sources. Things would have been much easier had this data been in a single format, but unfortunately, that's not the case. The client has sent her data in multiple formats, including HTML, JSON, Excel, and CSV. She has to extract the relevant information from all these files. These are not the only data sources she'll be working with, though. She also has to access the records of the top-performing and struggling students from a SQLite database so that she can analyze their performance patterns. All these disparate data elements differ in their data types, velocities, frequencies, and volumes. She must now extract different elements from these data sources by slicing, subsetting, grouping, merging, and reshaping the data to get a comprehensive list of features for further analysis. Since the volumes are large, she must also optimize her methods for efficient processing.

Does this scenario sound familiar to you? Are you overwhelmed by the data wrangling tasks that must be performed before the analytics processes? Well, you do not have to struggle anymore. pandas is a Python library that is capable of carrying out all these tasks and more. Over the years, pandas has become the go-to tool for all the preprocessing tasks involved in the life cycle of data analytics.

In this chapter, you will begin to explore and have fun with pandas, an amazing library that's used extensively by the data science and machine learning community. As you work through the exercises and activities in this chapter and the ones that follow, you will understand why pandas is considered the de facto standard when working with data. But first, let's take a short trip through time to understand the evolution of the library and get a glimpse into all the functionalities you will be learning about in this chapter.

 

Exploring the history and evolution of pandas

pandas, in its basic version, was open sourced in 2009 by Wes McKinney, an MIT graduate with experience in quantitative finance. He was unhappy with the tools available at the time, so he started building a tool that was intuitive and elegant and required minimal code. pandas went on to become one of the most popular tools in the data science community, so much so that it even helped increase Python's popularity to a great extent.

One of the primary reasons for the popularity of pandas is its ability to handle different types of data. pandas is well suited for handling the following:

  • Tabular data with columns that are capable of storing different types of data (such as numerical data and text data)
  • Ordered and unordered series data (an arbitrary sequence of numbers in a list, such as [2,4,8,9,10])
  • Multi-dimensional matrix data (three-dimensional, four-dimensional, and so on)
  • Any other form of observational/statistical data (such as SQL data and R data)

Besides this, a large repertoire of intuitive and easy-to-use functions/methods makes pandas the go-to tool for data analytics. In the next section, we'll cover the components of pandas and their main applications.

       
About the Authors
  • Blaine Bateman

    Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.

    Browse publications by this author
  • Saikat Basak

    Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.

    Browse publications by this author
  • Thomas V. Joseph

    Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.

    Browse publications by this author
  • William So

    William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.

    Browse publications by this author
The Pandas Workshop
Unlock this book and the full library FREE for 7 days
Start now