Chapter 2: Data Cleaning and Advanced Machine
Activity 2: Preparing to Train a Predictive Model for the Employee-Retention Problem
- Scroll to the
Activity Asection of thelesson-2-workbook.ipynbnotebook file. - Check the head of the table by running the following code:
%%bash head ../data/hr-analytics/hr_data.csv
Judging by the output, convince yourself that it looks to be in standard CSV format. For CSV files, we should be able to simply load the data with pd.read_csv.
- Load the data with Pandas by running
df = pd.read_csv('../data/hr- analytics/hr_data.csv'). Write it out yourself and use tab completion to help type the file path. - Inspect the columns by printing
df.columnsand make sure the data has loaded as expected by printing the DataFrameheadandtailwithdf.head()anddf.tail():
Figure 2.46: Output for inspecting head and tail of columns
We can see that it appears to have loaded correctly. Based on the tail index values, there are nearly 15,000 rows...