Using indexes to manipulate pandas objects

Exclusive offer: get 50% off this eBook here
Instant Data Intensive Apps with pandas How-to [Instant]

Instant Data Intensive Apps with pandas How-to [Instant] — Save 50%

Manipulate, visualize, and analyze your data with pandas with this book and ebook

$14.99    $7.50
by Trent Hauck | September 2013 | Open Source

The article, using indexes to manipulate objects in pandas, covers the important aspects of pandas. This article by Trent Hauck, author of the book Instant Data Intensive Apps with pandas How-to , gives an idea of about the importance of indexes in pandas.

Indexes are not advanced because they're difficult, but if we want to be an expert with pandas it is important that we use them well. We will discuss hierarchical indexes in the following There's more... section.

(For more resources related to this topic, see here.)

Getting ready

A good understanding of indexes in pandas is crucial to quickly move the data around. From a business intelligence perspective, they create a distinction similar to that of metrics and dimensions in an OLAP cube. To illustrate this point, this recipe walks through getting stock data out of pandas, combining it, then reindexing it for easy chomping.

How to do it...

  1. Use the DataReader object to transfer stock price information into a DataFrame and to explore the basic axis of Panel.

    > from pandas.i git push -u origin master o.data import DataReader > tickers = ['gs', 'ibm', 'f', 'ba', 'axp'] > dfs = {} > for ticker in tickers: dfs[ticker] = DataReader(ticker, "yahoo", '2006-01-01') # a yet undiscussed data structure, in the same way the a # DataFrame is a collection of Series, a Panel is a collection of # DataFrames > pan = pd.Panel(dfs) > pan <class 'pandas.core.panel.Panel'> Dimensions: 5 (items) x 1764 (major_axis) x 6 (minor_axis)Items axis: axp to ibm Major_axis axis: 2006-01-03 00:00:00 to 2013-01-04 00:00:00 Minor_axis axis: Open to Adj Close > pan.items Index([axp, ba, f, gs, ibm], dtype=object) > pan.minor_axis Index([Open, High, Low, Close, Volume, Adj Close], dtype=object) > pan.major_axis <class 'pandas.tseries.index.DatetimeIndex'>[2006-01-03 00:00:00, ..., 2013-01-04 00:00:00] Length: 1764, Freq: None, Timezone: None

  2. Use the axis selectors to easily compute different sets of summary statistics.

    > pan.minor_xs('Open').mean() axp 46.227466 ba 70.746451 f 9.135794 gs 151.655091 ibm 129.570969 # major axis is sliceable as well > day_slice = pan.major_axis[1] > pan.major_xs(day_slice)[['gs', 'ba']] ba gs Open 70.08 127.35 High 71.27 128.91 Low 69.86 126.38 Close 71.17 127.09 Volume 3165000.00 4861600.00 Adj Close 60.43 118.12 Convert the Panel to a DataFrame. > dfs = [] > for df in pan: idx = pan.major_axis idx = pd.MultiIndex.from_tuples(zip([df]*len(idx), idx)) idx.names = ['ticker', 'timestamp'] dfs.append(pd.DataFrame(pan[df].values, index=idx, columns=pan.minor_axis)) > df = pd.concat(dfs) > df Data columns: Open 8820 non-null values High 8820 non-null values Low 8820 non-null values Close 8820 non-null values Volume 8820 non-null values Adj Close 8820 non-null values dtypes: float64(6)

  3. Perform the analogous operations as in the preceding examples on the newly created DataFrame.

    # selecting from a MultiIndex isn't much different than the Panel # (output muted) > df.ix['gs':'ibm'] > df['Open']

How it works...

The previous example was certainly contrived, but when indexing and statistical techniques are incorporated, the power of pandas begins to come through. Statistics will be covered in an upcoming recipe.

pandas' indexes by themselves can be thought of as descriptors of a certain point in the DataFrame. When ticker and timestamp are the only indexes in a DataFrame, then the point is individualized by the ticker, timestamp, and column name. After the point is individualized, it's more convenient for aggregation and analysis.

There's more...

Indexes show up all over the place in pandas so it's worthwhile to see some other use cases as well.

Advanced header indexes

Hierarchical indexing isn't limited to rows. Headers can also be represented by MultiIndex, as shown in the following command line:

> header_top = ['Price', 'Price', 'Price', 'Price', 'Volume', 'Price'] > df.columns = pd.MultiIndex.from_tuples(zip(header_top, df.columns)

Performing aggregate operations with indexes

As a prelude to the following sections, we'll do a single groupby function here since they work with indexes so well.

> df.groupby(level=['tickers', 'day'])['Volume'].mean()

This answers the question for each ticker and for each day (not date), that is, what was the mean volume over the life of the data.

Summary

This article talks about the use and importance of indexes in pandas. It also talks about different operations that can be done with indexes.

Resources for Article :


Further resources on this subject:


Instant Data Intensive Apps with pandas How-to [Instant] Manipulate, visualize, and analyze your data with pandas with this book and ebook
Published: May 2013
eBook Price: $14.99
See more
Select your format and quantity:

About the Author :


Trent Hauck

Trent Hauck is a graduate from the University of Kansas. He holds a Bachelor's in Accounting and a Master's in Finance. Early in his career he worked in Finance and Insurance, but has transitioned to Marketing and Analytics. Working with data and finding tools for efficient use has been a theme throughout.

Books From Packt


Instant Django 1.5 Application Development Starter
Instant Django 1.5 Application Development Starter

MySQL for Python
MySQL for Python

Python 3 Web Development Beginner's Guide
Python 3 Web Development Beginner's Guide

Learning IPython for Interactive Computing and Data Visualization
Learning IPython for Interactive Computing and Data Visualization

Nginx HTTP Server - Second Edition
Nginx HTTP Server - Second Edition

web2py Application Development Cookbook
web2py Application Development Cookbook

Expert Python Programming
Expert Python Programming

Building Machine Learning Systems with Python
Building Machine Learning Systems with Python


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
k
D
8
2
q
q
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software