0% found this document useful (0 votes)
12 views5 pages

Feature Selection Techniques Explained

Feature selection is a crucial step in machine learning that involves selecting a subset of relevant features to enhance model performance and reduce computational costs. There are three main categories of feature selection techniques: Filter Methods, Wrapper Methods, and Embedded Methods, each with its own advantages and limitations. Choosing the appropriate method depends on factors like dataset size, feature interactions, and model type, ultimately leading to improved accuracy and interpretability of machine learning models.

Uploaded by

Adarsh Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Feature Selection Techniques Explained

Feature selection is a crucial step in machine learning that involves selecting a subset of relevant features to enhance model performance and reduce computational costs. There are three main categories of feature selection techniques: Filter Methods, Wrapper Methods, and Embedded Methods, each with its own advantages and limitations. Choosing the appropriate method depends on factors like dataset size, feature interactions, and model type, ultimately leading to improved accuracy and interpretability of machine learning models.

Uploaded by

Adarsh Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Feature Selection Techniques in Machine Learning

In data science many times we encounter vast of features present in a


dataset. But it is not necessary all features contribute equally in prediction
that where feature engineering comes. It helps in choosing important
features while discarding rest. In this article we will learn more
about it and its techniques.

Feature Selection Foundation

Feature selection is a important step in machine learning which involves


selecting a subset of relevant features from the original feature
set to reduce the feature space while improving the model’s
performance by reducing computational power. It’s a critical step
in the machine learning especially when dealing with high-
dimensional data.

In real-world machine learning tasks not all features in the dataset


contribute equally to model performance. Some features may be
redundant, irrelevant or even noisy. Feature selection helps remove these
improving the model’s accuracy instead of random guessing based on all
features and increased interpretability.

There are various algorithms used for feature selection and are grouped
into three main categories:

1. Filter Methods

2. Wrapper Methods

3. Embedded Methods

Each one has its own strengths and trade-offs depending on the use case.

1. Filter Methods

Filter methods evaluate each feature independently with target variable.


Feature with high correlation with target variable are selected as it means
this feature has some relation and can help us in making predictions.
These methods are used in the preprocessing phase to remove irrelevant
or redundant features based on statistical tests (correlation) or other
criteria.

Filter Methods Implementation

Advantages:
 Fast and inexpensive: Can quickly evaluate features without
training the model.

 Good for removing redundant or correlated features.

Limitations: These methods don’t consider feature interactions so they


may miss feature combinations that improve model performance.

Some techniques used are:

 Information Gain – It is defined as the amount of information


provided by the feature for identifying the target value and
measures reduction in the entropy values. Information gain of each
attribute is calculated considering the target values for feature
selection.

 Chi-square test — Chi-square method (X2) is generally used to


test the relationship between categorical variables. It compares the
observed values from different attributes of the dataset to its
expected value.

Chi-square Formula

 Fisher’s Score – Fisher’s Score selects each feature independently


according to their scores under Fisher criterion leading to a
suboptimal set of features. The larger the Fisher’s score is, the
better is the selected feature.

 Correlation Coefficient – Pearson’s Correlation Coefficient is a


measure of quantifying the association between the two continuous
variables and the direction of the relationship with its values ranging
from -1 to 1.

 Variance Threshold – It is an approach where all features are


removed whose variance doesn’t meet the specific threshold. By
default, this method removes features having zero variance. The
assumption made using this method is higher variance features are
likely to contain more information.

 Mean Absolute Difference (MAD) – This method is similar to


variance threshold method but the difference is there is no square in
MAD. This method calculates the mean absolute difference from the
mean value.
 Dispersion Ratio – Dispersion ratio is defined as the ratio of the
Arithmetic mean (AM) to that of Geometric mean (GM) for a given
feature. Its value ranges from +1 to ∞ as AM ≥ GM for a given
feature. Higher dispersion ratio implies a more relevant feature.

2. Wrapper methods

Wrapper methods are also referred as greedy algorithms that train


algorithm. They use different combination of features and compute
relation between these subset features and target variable and
based on conclusion addition and removal of features are
done. Stopping criteria for selecting the best subset are usually pre-
defined by the person training the model such as when the performance
of the model decreases or a specific number of features are achieved.

Wrapper Methods Implementation

Advantages:

 Can lead to better model performance since they evaluate feature


subsets in the context of the model.

 They can capture feature dependencies and interactions.

Limitations: They are computationally more expensive than filter


methods especially for large datasets.

Some techniques used are:

 Forward selection – This method is an iterative approach where


we initially start with an empty set of features and keep adding a
feature which best improves our model after each iteration. The
stopping criterion is till the addition of a new variable does not
improve the performance of the model.

 Backward elimination – This method is also an iterative approach


where we initially start with all features and after each iteration, we
remove the least significant feature. The stopping criterion is till no
improvement in the performance of the model is observed after the
feature is removed.

 Recursive elimination – This greedy optimization method selects


features by recursively considering the smaller and smaller set of
features. The estimator is trained on an initial set of features and
their importance is obtained using feature_importance_attribute.
The least important features are then removed from the current set
of features till we are left with the required number of features.

3. Embedded methods

Embedded methods perform feature selection during the model training


process. They combine the benefits of both filter and wrapper methods.
Feature selection is integrated into the model training allowing the model
to select the most relevant features based on the training process
dynamically.

Embedded Methods Implementation

Advantages:

 More efficient than wrapper methods because the feature selection


process is embedded within model training.

 Often more scalable than wrapper methods.

Limitations: Works with a specific learning algorithm so the feature


selection might not work well with other models

Some techniques used are:

 L1 Regularization (Lasso): A regression method that applies L1


regularization to encourage sparsity in the model. Features with
non-zero coefficients are considered important.

 Decision Trees and Random Forests: These algorithms naturally


perform feature selection by selecting the most important features
for splitting nodes based on criteria like Gini impurity or information
gain.

 Gradient Boosting: Like random forests gradient boosting models


select important features while building trees by prioritizing features
that reduce error the most.

Choosing the Right Feature Selection Method

Choice of feature selection method depends on several factors:


 Dataset Size: Filter methods are often preferred for very large
datasets due to their speed.

 Feature Interactions: Wrapper and embedded methods are better


for capturing complex feature interactions.

 Model Type: Some methods like Lasso and decision trees are more
suitable for certain models like linear models or tree-based models.

For example filter methods like correlation or variance threshold are


excellent when we have a lot of features and want to remove irrelevant
ones quickly. However if we want to maximize model performance and
have the computational resources we might want to explore wrapper
methods like RFE or embedded methods like Lasso.

Feature selection is a critical step in building efficient and accurate


machine learning models. By choosing the right features we can improve
our model’s accuracy, reduce overfitting and make it more interpretable.
Each feature selection method has its strengths and weaknesses and
understanding them will help us to choose the right approach for our
dataset and task.

You might also like