Reader small image

You're reading from  Mastering pandas. - Second Edition

Product typeBook
Published inOct 2019
Reading LevelIntermediate
Publisher
ISBN-139781789343236
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Ashish Kumar
Ashish Kumar
author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Right arrow

Preface

pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas for performing complex data analysis in various domains. It provides features and capabilities that make data analysis much easier and faster than with many other popular languages, such as Java, C, C++, and Ruby.

Who this book is for

This book is for data scientists, analysts, and Python developers who wish to explore advanced data analysis and scientific computing techniques using pandas. Some fundamental understanding of Python programming and familiarity with basic data analysis concepts is all you need to get started with this book.

What this book covers

Chapter 1, Introduction to pandas and Data Analysis, will introduce pandas and explain where it fits in the data analysis pipeline. We will also look into some of the popular applications of pandas and how Python and pandas can be used for data analysis.

Chapter 2, Installation of pandas and Supporting Software, will deal with the installation of Python (if necessary), the pandas library, and all necessary dependencies for the Windows, macOS X, and Linux platforms. We will also look into the command-line tricks and options and settings for pandas as well.

Chapter 3, Using NumPy and Data Structures with pandas, will give a quick tour of the power of NumPy and provide a glimpse of how it makes life easier when working with pandas. We will also be implementing a neural network with NumPy and exploring some of the practical applications of multi-dimensional arrays.

Chapter 4, I/O of Different Data Formats with pandas, will teach you how to read and write commonplace formats, such as comma-separated value (CSV), with all the options, as well as more exotic file formats, such as URL, JSON, and XML. We will also create files in those formats from data objects and create niche plots from within pandas.

Chapter 5, Indexing and Selecting in pandas, will show you how to access and select data from pandas data structures. We will look in detail at basic indexing, label indexing, integer indexing, mixed indexing, and the operation of indexes.

Chapter 6, Grouping, Merging, and Reshaping Data in pandas, will examine the various functions that enable us to rearrange data, by having you utilize such functions on real-world datasets. We will also learn about grouping, merging, and reshaping data.

Chapter 7, Special Data Operations in pandas, will discuss and elaborate on the methods, syntax, and usage of some of the special data operations in pandas.

Chapter 8, Time Series and Plotting Using Matplotlib, will look at how to handle time series and dates. We will also take a tour of some topics that are necessary for you to know about in order to develop your expertise in using pandas.

Chapter 9, Making Powerful Reports Using pandas in Jupyter, will look into the application of a range of styling, as well as the formatting options that pandas has. We will also learn how to create dashboards and reports in the Jupyter Notebook.

Chapter 10, A Tour of Statistics with pandas and NumPy, will delve into how pandas can be used to perform statistical calculations using packages and calculations.

Chapter 11, A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates, will examine an alternative approach to statistics, which is the Bayesian approach. We will also look into the key statistical distributions and see how we can use various statistical packages to generate and plot distributions in matplotlib.

Chapter 12, Data Case Studies Using pandas, will discuss how we can solve real-life data case studies using pandas. We will look into web scraping with Python and data validation as well.

Chapter 13, The pandas Library Architecture, will discuss the architecture and code structure of the pandas library. This chapter will also briefly demonstrate how you can improve performance using Python extensions.

Chapter 14, pandas Compared with Other Tools, will focus on comparing pandas, with R and other tools such as SQL and SAS. We will also look into slicing and selection as well.

Chapter 15, Brief Tour of Machine Learning, will conclude the book by giving a brief introduction to the scikit-learn library for doing machine learning and show how pandas fits within that framework.

To get the most out of this book

The following software will be used while we execute the code:

  • Windows/macOS/Linux
  • Python 3.6
  • pandas
  • IPython
  • R
  • scikit-learn

For hardware, there are no specific requirements. Python and pandas can run on a Mac, Linux, or Windows machine.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Pandas-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Python has an built-in array module to create arrays."

A block of code is set as follows:

source_python("titanic.py")
titanic_in_r <- get_data_head("titanic.csv")

Any command-line input or output is written as follows:

 python --version

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Any notebooks in other directories could be transferred to the current working directory of the Jupyter Notebook through the Upload option."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering pandas. - Second Edition
Published in: Oct 2019Publisher: ISBN-13: 9781789343236
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar