Calculating Boolean statistics
It can be informative to calculate basic summary statistics on Boolean arrays. Each value of a Boolean array, the True
or False
, evaluates to 1 or 0 respectively, so all the Series methods that work with numerical values also work with Booleans.
In this recipe, we create a Boolean array by applying a condition to a column of data and then calculate summary statistics from it.
How to do it…
- Read in the movie dataset, set the index to the movie title, and inspect the first few rows of the
duration
column:>>> import pandas as pd >>> import numpy as np >>> movie = pd.read_csv( ... "data/movie.csv", index_col="movie_title" ... ) >>> movie[["duration"]].head() Duration movie_title Avatar 178.0 Pirates of the Caribbean: At World's End 169.0 Spectre ...