In the previous sections, you learned how to use L1 regularization to zero out irrelevant features via logistic regression and use the SBS algorithm for feature selection. Another useful approach to select relevant features from a dataset is to use a random forest, an ensemble technique that we introduced in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn. Using a random forest, we can measure feature importance as the averaged impurity decrease computed from all decision trees in the forest without making any assumptions whether our data is linearly separable or not. Conveniently, the random forest implementation in scikit-learn already collects feature importances for us so that we can access them via the feature_importances_
attribute after fitting a RandomForestClassifier
. By executing the following code, we will now train a forest of 10,000 trees on the Wine dataset and rank the 13 features by their respective importance...
Argentina
Australia
Austria
Belgium
Brazil
Bulgaria
Canada
Chile
Colombia
Cyprus
Czechia
Denmark
Ecuador
Egypt
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
India
Indonesia
Ireland
Italy
Japan
Latvia
Lithuania
Luxembourg
Malaysia
Malta
Mexico
Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
Romania
Russia
Singapore
Slovakia
Slovenia
South Africa
South Korea
Spain
Sweden
Switzerland
Taiwan
Thailand
Turkey
Ukraine
United States