pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas for performing complex data analysis in various domains. It provides features and capabilities that make data analysis much easier and faster than with many other popular languages, such as Java, C, C++, and Ruby.
You're reading from Mastering pandas. - Second Edition
Who this book is for
This book is for data scientists, analysts, and Python developers who wish to explore advanced data analysis and scientific computing techniques using pandas. Some fundamental understanding of Python programming and familiarity with basic data analysis concepts is all you need to get started with this book.
What this book covers
Chapter 1, Introduction to pandas and Data Analysis, will introduce pandas and explain where it fits in the data analysis pipeline. We will also look into some of the popular applications of pandas and how Python and pandas can be used for data analysis.
Chapter 2, Installation of pandas and Supporting Software, will deal with the installation of Python (if necessary), the pandas library, and all necessary dependencies for the Windows, macOS X, and Linux platforms. We will also look into the command-line tricks and options and settings for pandas as well.
Chapter 3, Using NumPy and Data Structures with pandas, will give a quick tour of the power of NumPy and provide a glimpse of how it makes life easier when working with pandas. We will also be implementing a neural network with NumPy and exploring some of the practical applications of multi-dimensional arrays.
Chapter 4, I/O of Different Data Formats with pandas, will teach you how to read and write commonplace formats, such as comma-separated value (CSV), with all the options, as well as more exotic file formats, such as URL, JSON, and XML. We will also create files in those formats from data objects and create niche plots from within pandas.
Chapter 5, Indexing and Selecting in pandas, will show you how to access and select data from pandas data structures. We will look in detail at basic indexing, label indexing, integer indexing, mixed indexing, and the operation of indexes.
Chapter 6, Grouping, Merging, and Reshaping Data in pandas, will examine the various functions that enable us to rearrange data, by having you utilize such functions on real-world datasets. We will also learn about grouping, merging, and reshaping data.
Chapter 7, Special Data Operations in pandas, will discuss and elaborate on the methods, syntax, and usage of some of the special data operations in pandas.
Chapter 8, Time Series and Plotting Using Matplotlib, will look at how to handle time series and dates. We will also take a tour of some topics that are necessary for you to know about in order to develop your expertise in using pandas.
Chapter 9, Making Powerful Reports Using pandas in Jupyter, will look into the application of a range of styling, as well as the formatting options that pandas has. We will also learn how to create dashboards and reports in the Jupyter Notebook.
Chapter 10, A Tour of Statistics with pandas and NumPy, will delve into how pandas can be used to perform statistical calculations using packages and calculations.
Chapter 11, A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates, will examine an alternative approach to statistics, which is the Bayesian approach. We will also look into the key statistical distributions and see how we can use various statistical packages to generate and plot distributions in matplotlib.
Chapter 12, Data Case Studies Using pandas, will discuss how we can solve real-life data case studies using pandas. We will look into web scraping with Python and data validation as well.
Chapter 13, The pandas Library Architecture, will discuss the architecture and code structure of the pandas library. This chapter will also briefly demonstrate how you can improve performance using Python extensions.
Chapter 14, pandas Compared with Other Tools, will focus on comparing pandas, with R and other tools such as SQL and SAS. We will also look into slicing and selection as well.
Chapter 15, Brief Tour of Machine Learning, will conclude the book by giving a brief introduction to the scikit-learn library for doing machine learning and show how pandas fits within that framework.
To get the most out of this book
The following software will be used while we execute the code:
- Windows/macOS/Linux
- Python 3.6
- pandas
- IPython
- R
- scikit-learn
For hardware, there are no specific requirements. Python and pandas can run on a Mac, Linux, or Windows machine.
Download the example code files
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register at www.packt.com.
- Select the SUPPORT tab.
- Click on Code Downloads & Errata.
- Enter the name of the book in the Search box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
- WinRAR/7-Zip for Windows
- Zipeg/iZip/UnRarX for Mac
- 7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Pandas-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789343236_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Python has an built-in array module to create arrays."
A block of code is set as follows:
source_python("titanic.py")
titanic_in_r <- get_data_head("titanic.csv")
Any command-line input or output is written as follows:
python --version
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Any notebooks in other directories could be transferred to the current working directory of the Jupyter Notebook through the Upload option."
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.