DataFrame
While pd.Series is the building block, pd.DataFrame is the main object that comes to mind for users of pandas. pd.DataFrame is the primary and most commonly used object in pandas, and when people think of pandas, they typically envision working with a pd.DataFrame.
In most analysis workflows, you will be importing your data from another source, but for now, we will show you how to construct a pd.DataFrame directly (input/output will be covered in Chapter 4, The pandas I/O System).
How to do it
The most basic construction of a pd.DataFrame happens with a two-dimensional sequence, like a list of lists:
pd.DataFrame([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
])
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
With a list of lists, pandas will automatically number the row and column labels for you. Typically, users of pandas will at least provide labels for columns, as it makes indexing and selecting from a pd.DataFrame much more intuitive (see Chapter 2, Selection and Assignment, for an introduction to indexing and selecting). To label your columns when constructing a pd.DataFrame from a list of lists, you can provide a columns= argument to the constructor:
pd.DataFrame([
[1, 2],
[4, 8],
], columns=["col_a", "col_b"])
col_a col_b
0 1 2
1 4 8
Instead of using a list of lists, you could also provide a dictionary. The keys of the dictionary will be used as column labels, and the values of the dictionary will represent the values placed in that column of the pd.DataFrame:
pd.DataFrame({
"first_name": ["Jane", "John"],
"last_name": ["Doe", "Smith"],
})
first_name last_name
0 Jane Doe
1 John Smith
In the above example, our dictionary values were lists of strings, but the pd.DataFrame does not strictly require lists. Any sequence will work, including a pd.Series:
ser1 = pd.Series(range(3), dtype="int8", name="int8_col")
ser2 = pd.Series(range(3), dtype="int16", name="int16_col")
pd.DataFrame({ser1.name: ser1, ser2.name: ser2})
int8_col int16_col
0 0 0
1 1 1
2 2 2