Working in machine learning field is not only about building different classification or clustering models. It’s more about feeding the right set of features into the training models.
This process of feeding the right set of features into the model mainly take place after the data collection process.
Once we have enough data, We won’t feed entire data into the model and expect great results. We need to pre-process the data.
In fact, the challenging and the key part of machine learning processes is data preprocessing.
Below are the key things we indented to do in data preprocessing stage.
- Feature transformation
- Feature selection
Feature transformation is to transform the already existed features into other forms. Suppose using the logarithmic function to convert normal features to logarithmic features.
Feature selection is to select the best features out of already existed features. In this article, we are going to learn the basic techniques to pick the best features for modeling.
Before we drive further. Let’s have a look at the table of contents.
Table of contents:
- Why modeling is not the final step
- The role of correlation
- Calculating feature importance with regression methods
- Using caret package to calculate feature importance
- Random forest for calculating feature importance