Selecting with unique and sorted indexes
Index selection performance drastically improves when the index is unique or sorted. The prior recipe used an unsorted index that contained duplicates, which makes for relatively slow selections.
Getting ready
In this recipe, we use the college dataset to form unique or sorted indexes to increase the performance of index selection. We will continue to compare the performance to boolean indexing as well.
How to do it...
- Read in the college dataset, create a separate DataFrame with
STABBRas the index, and check whether the index is sorted:
>>> college = pd.read_csv('data/college.csv')
>>> college2 = college.set_index('STABBR')
>>> college2.index.is_monotonic
False- Sort the index from
college2and store it as another object:
>>> college3 = college2.sort_index() >>> college3.index.is_monotonic True
- Time the selection of the state of Texas (TX) from all three DataFrames:
>>> %timeit college[college['STABBR...