Reader small image

You're reading from  The Pandas Workshop

Product typeBook
Published inJun 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800208933
Edition1st Edition
Languages
Concepts
Right arrow
Authors (4):
Blaine Bateman
Blaine Bateman
author image
Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

Saikat Basak
Saikat Basak
author image
Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

Thomas V. Joseph
Thomas V. Joseph
author image
Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

William So
William So
author image
William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So

View More author details
Right arrow

Preface

The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects.

You'll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on the theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you writing clean code and building your understanding through hands-on practice.

As you work through this Python pandas book, you'll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services.

By the end of this data analytics book, you'll have the knowledge, skills, and confidence to solve your own challenging data science problems with pandas.

Who this book is for

This data analysis book is for anyone with prior experience working with the Python programming language who wants to learn the fundamentals of data analysis with pandas. Previous knowledge of pandas is not necessary.

What this book covers

Chapter 1, Introduction, shows how pandas is one of the most versatile applications for data processing today and why it is the most sought-after tool for any data scientist. This chapter gives a brief introduction to many of the versatile features of pandas. It also takes a tour through all the topics that will be covered in this book, along with some introductory exercises using pandas.

Chapter 2, Data Structures, covers a key benefit of pandas, which is that it provides intuitive data structures that align to a wide range of data analysis tasks. The focus here is on learning about the important data structures in pandas, especially DataFrames, Series, and pandas index structures.

Chapter 3, Data I/O, explores the built-in functions that pandas provides to read data from a large variety of sources, as well as write data back to them, or to new files. In this chapter, you will learn all the important supported I/O methods.

Chapter 4, Data Types, explains why, when doing data analysis with pandas, it is critical to use the correct data type, otherwise, unexpected results or errors might appear. In this chapter, you will learn about pandas data types and how to use them.

Chapter 5, Data Selection – DataFrames, does a deep dive into using DataFrames now that you are well versed in the available data structures and methods in pandas.

Chapter 6, Data Selection – Series, highlights some of the important differences when working with pandas Series and is a companion to Chapter 5, Data Selection – DataFrames.

Chapter 7, Data Transformation, talks about how any dataset comes with challenges to its quality. In this chapter, you will learn how to use pandas to solve these challenges and make them ready for your analysis.

Chapter 8, Data Visualization, discusses how pandas offers in-built data visualization methods to accelerate your data analysis. In this chapter, you will learn how to build data visualizations from a DataFrame and how to further customize them with matplotlib.

Chapter 9, Data Modeling – Preprocessing, helps you to understand how to do some preliminary data review and analysis in pandas as a preparatory step to modeling, as well as some transformations important to successful modeling.

Chapter 10, Data Modeling – Modeling Basics, introduces you to some powerful pandas methods for resampling and smoothing data to find patterns and gain insights that can be used in more complex modeling tasks.

Chapter 11, Data Modeling – Regression Modeling, focuses on a workhorse method, regression modeling, as the next step toward using models to understand data and make predictions. By the end of the chapter, you will be tackling complex multi-variate datasets with regression models.

Chapter 12, Using Time in pandas, describes another type of data supported by pandas, time series data. It also looks at how pandas provides a wide range of methods to handle data organized by dates and/or times. You will learn how to do operations on time stamps, and see all the additional time-related attributes provided by pandas.

Chapter 13, Exploring Time Series, focuses on how to use a time series index to perform operations on time series data to gain insights. By the end of the chapter, you will apply regression modeling to time series data.

Chapter 14, Case Studies/Mini Projects, enables you to apply your knowledge to data analytics problems, as you will have learned a great deal about pandas throughout this book. This chapter will cover three case studies where you will apply all the skill sets you have gained through this book.

To get the most out of this book

This book assumes a good working knowledge of Python and, in particular, using Jupyter Notebook to create code. You need a Python environment set up on your local computer including Jupyter Notebook, and of course, pandas. There are additional dependencies you may need to install depending on how you created your local environment. The full list is available in the requirements.txt file in the GitHub repository for this workshop. The main tools used are given here:

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktWorkshops/The-Pandas-Workshop. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800208933_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "pandas offers us the .corr() method to use with DataFrames as follows."

A block of code is set as follows:

lin_model = sm.OLS(metal_data['alloy_hardness'], X)
my_model = lin_model.fit()
print(my_model.summary())

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

import pandas as pd
my_data = pd.read_csv('Datasets/auto-mpg.data.csv')
my_data.head()x1 and x2:  -0.9335045017430936

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read The Pandas Workshop, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Pandas Workshop
Published in: Jun 2022Publisher: PacktISBN-13: 9781800208933
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

author image
Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

author image
Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

author image
William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So