0% found this document useful (0 votes)
37 views23 pages

Feature Generation & Selection for Retention

Feature generation and selection are critical steps in data preprocessing for machine learning, enhancing model accuracy and efficiency by focusing on relevant data. User retention is highlighted as a key application, where these techniques help identify behaviors influencing customer loyalty. The document also discusses various methods for feature generation and selection, emphasizing the importance of domain knowledge and creativity in improving machine learning outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views23 pages

Feature Generation & Selection for Retention

Feature generation and selection are critical steps in data preprocessing for machine learning, enhancing model accuracy and efficiency by focusing on relevant data. User retention is highlighted as a key application, where these techniques help identify behaviors influencing customer loyalty. The document also discusses various methods for feature generation and selection, emphasizing the importance of domain knowledge and creativity in improving machine learning outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-3rd

Feature Generation and Feature Selection

Feature Generation and Feature Selection (Extracting Meaning from Data)

Introduction

Feature generation and feature selection are two crucial steps in the data preprocessing stage of
machine learning. These steps ensure that the model is built on meaningful and relevant data,
improving its accuracy and [Link] Generation: Involves creating meaningful metrics
from raw data using domain knowledge and [Link] Selection: Narrows down the
features to only the most relevant ones using algorithms like RFE, Lasso Regression, and
Chi-Square [Link] in User Retention: These techniques help identify the key behaviors
that influence whether a user stays or churns, leading to better predictive [Link]
engineering (generation and selection) is a blend of art and science. Combining domain expertise
with data-driven techniques can significantly improve machine learning outcomes.

Motivating Application: User (Customer) Retention

What is User Retention?

User retention refers to the ability of a company to keep its customers over time. It is an essential
metric for businesses, especially subscription-based services, e-commerce platforms, and apps. A
high retention rate indicates customer satisfaction and loyalty.

Why Focus on Feature Generation and Selection for Retention?

● Improves Model Accuracy: Retention models need to predict which users are likely to
stay or churn. Accurate features make these predictions more reliable.
● Reduces Overfitting: By selecting only the most relevant features, we prevent the model
from learning noise in the data.
● Saves Computational Resources: Eliminating irrelevant features reduces processing time
and storage requirements.

Feature Generation

Feature generation involves creating new features from raw data. These features should capture
useful patterns that improve model performance.

Steps in Feature Generation

1. Understand the Domain


○ Domain expertise helps identify which data points are most likely to influence the
target variable (e.g., churn).
○ Example: In user retention, domain experts might suggest that "time spent on the
app per session" is a critical metric.
2. Brainstorming
○ Engage in a collaborative process to identify potential features.
○ Example: For an e-commerce platform:
■ Total number of purchases in the last 30 days.
■ Average time between purchases.
■ Percentage of discounted items in the cart.
3. Imagination and Creativity
○ Look beyond the obvious. Think of features that indirectly affect retention.
○ Example: Users who spend a lot of time browsing reviews might be more
engaged.
4. Data Transformation
○ Convert raw data into useful metrics.
○ Techniques include:
■ Aggregation: Sum, mean, count, etc.
■ Encoding: Convert categorical variables to numerical ones (e.g., one-hot
encoding).
■ Scaling: Normalize features to a standard range.

Example of Feature Generation

Data: User activity logs

● Raw data: Login timestamps, purchase history, browsing history.


● Generated features:
○ Login frequency: Number of logins per week.
○ Purchase consistency: Ratio of purchases to visits.
○ Engagement score: Weighted score based on time spent and actions taken.

Feature Selection

Feature selection is the process of choosing the most relevant features for a machine learning
model while removing redundant or irrelevant ones.

Importance of Feature Selection

● Improves Model Simplicity: Fewer features make models easier to interpret.


● Enhances Performance: Models built on relevant features perform better and generalize
well to new data.
● Speeds Up Computation: Less data to process means faster training times.

Common Feature Selection Algorithms

1. Filter Methods
○ Rank features based on statistical metrics.
○ Techniques:
■ Correlation Coefficient: Measures the relationship between features and
the target variable.
■ Chi-Square Test: For categorical data, checks feature importance.
■ Variance Threshold: Removes features with low variance.
○ Example:
■ Correlation matrix reveals that "number of app opens" correlates highly
with retention, while "screen brightness preference" does not.
2. Wrapper Methods
○ Evaluate subsets of features by training and testing models.
○ Techniques:
■ Recursive Feature Elimination (RFE): Starts with all features and removes
the least important iteratively.
■ Forward Selection: Adds features one by one, keeping those that improve
model performance.
■ Backward Elimination: Starts with all features, removes one at a time
based on significance.
○ Example:
■ Use RFE to reduce 50 features to the top 10 most predictive ones.
3. Embedded Methods
○ Feature selection happens during model training.
○ Techniques:
■ Lasso Regression (L1 Regularization): Shrinks less important feature
coefficients to zero.
■ Tree-based Models: Use feature importance scores.
○ Example:
■ A decision tree ranks "average session time" as the top feature for
predicting churn.

Example of Feature Selection

Data: User behavior metrics

● Initial features: 30 metrics, including login frequency, purchase amount, etc.


● Selected features:
○ Login frequency
○ Time since last purchase
○ Discount usage ratio
● Method: Applied correlation matrix to remove highly correlated features and used Lasso
Regression for final selection.

Difference Between Feature Selection and Feature Extraction


Machine learning models require input features that are relevant and important to predict the
outcome. However, not all features are equally important for a prediction task, and some features
might even introduce noise in the model. Feature selection and feature extraction are two
methods to handle this problem. In this article, we will explore the differences between feature
selection and feature extraction methods in machine learning.

Feature Selection

Feature selection is a process of selecting a subset of relevant features from the original set of
features. The goal is to reduce the dimensionality of the feature space, simplify the model, and
improve its generalization performance. Feature selection methods can be categorized into three
types:

● Filter Methods
● Wrapper methods
● Embedded methods.

Filter methods rank features based on their statistical properties and select the top-ranked
features. Wrapper methods use the model performance as a criterion to evaluate the feature
subset and search for the optimal feature subset. Embedded methods incorporate feature
selection as a part of the model training process.

Here is an example of feature selection in the Recursive Feature Elimination (RFE) method. RFE
is a wrapper method that selects the most important features by recursively removing the least
important features and retraining the model. The feature ranking is based on the coefficients of
the model.
Filter Methods

Filter methods are the simplest and most computationally efficient methods for feature selection.
In this approach, features are selected based on their statistical properties, such as their
correlation with the target variable or their variance. These methods are easy to implement and
are suitable for datasets with a large number of features. However, they may not always produce
the best results as they do not take into account the interactions between features.

Wrapper Methods

Wrapper methods are more sophisticated than filter methods and involve training a machine
learning model to evaluate the performance of different subsets of features. In this approach, a
search algorithm is used to select a subset of features that results in the best model performance.
Wrapper methods are more accurate than filter methods as they take into account the interactions
between features. However, they are computationally expensive, especially when dealing with
large datasets or complex models.

Embedded Methods

Embedded methods are a hybrid of filter and wrapper methods. In this approach, feature
selection is integrated into the model training process, and features are selected based on their
importance in the model. Embedded methods are more efficient than wrapper methods as they do
not require a separate feature selection step. They are also more accurate than filter methods as
they take into account the interactions between features. However, they may not be suitable for
all models as not all models have built-in feature selection capabilities.

Univariate Feature Selection


Univariate Feature Selection is a type of filter method used for feature selection. It involves
selecting the features based on their individual performance in relation to the target variable. The
most commonly used metric for this type of selection is the ANOVA F-value or chi-squared
statistic for categorical data.

This is an example of the code implementation of Univariate Feature Selection using the
ANOVA F-value metric in Python with scikit-learn:

Feature Extraction
Feature extraction is a process of transforming the original features into a new set of features that
are more informative and compact. The goal is to capture the essential information from the
original features and represent it in a lower-dimensional feature space. Feature extraction
methods can be categorized into linear methods and nonlinear methods.
● Linear methods use linear transformations such as Principal Component Analysis
(PCA) and Linear Discriminant Analysis (LDA) to extract features. PCA finds the
principal components that explain the maximum variance in the data, while LDA
finds the projection that maximizes the class separability.
● Nonlinear methods use nonlinear transformations such as Kernel PCA and
Autoencoder to extract features. Kernel PCA uses kernel functions to map the data
into a higher-dimensional space and finds the principal components in that space.
Autoencoder is a neural network architecture that learns to compress the data into a
lower-dimensional representation and reconstruct it back to the original space.
● Here is an example of feature extraction in the Mel-Frequency Cepstral Coefficients
(MFCC) method. MFCC is a nonlinear method that extracts features from audio
signals for speech recognition tasks. It first applies a filter bank to the audio signals to
extract the spectral features, then applies the Discrete Cosine Transform (DCT) to the
log-magnitude spectrum to extract the cepstral features.

Why is feature selection/extraction required?

Feature selection/extraction is an important step in many machine-learning tasks, including


classification, regression, and clustering. It involves identifying and selecting the most relevant
features (also known as predictors or input variables) from a dataset while discarding the
irrelevant or redundant ones. This process is often used to improve the accuracy, efficiency, and
interpretability of a machine-learning model.
Here are some of the main reasons why feature selection/extraction is required in machine
learning:
1. Improved Model Performance: The inclusion of irrelevant or redundant features can
negatively impact the performance of a machine learning model. Feature
selection/extraction can help to identify the most important and informative features,
which can lead to better model performance, higher accuracy, and lower error rates.
2. Reduced Overfitting: Including too many features in a model can cause overfitting,
where the model becomes too complex and starts to fit the noise in the data instead of
the underlying patterns. Feature selection/extraction can help to reduce overfitting by
focusing on the most relevant features and avoiding the inclusion of noise.
3. Faster Model Training and Inference: Feature selection/extraction can help to reduce
the dimensionality of a dataset, which can make model training and inference faster
and more efficient. This is especially important in large-scale or real-time
applications, where speed and performance are critical.
4. Improved Interpretability: Feature selection/extraction can help to simplify the model
and make it more interpretable, by focusing on the most important features and
discarding the less important ones. This can help to explain how the model works and
why it makes certain predictions, which can be useful in many applications, such as
healthcare, finance, and law.

Difference Feature Selection and Feature Extraction Methods


Feature selection and feature extraction methods have their advantages and disadvantages,
depending on the nature of the data and the task at hand.
Feature Selection Feature Extraction

Selects a subset of relevant Extracts a new set of features that


1. features from the original set are more informative and
of features. compact.

Captures the essential information


Reduces the dimensionality
2. from the original features and
of the feature space and
represents it in a
simplifies the model.
lower-dimensional feature space.

Can be categorized into


3. Can be categorized into linear and
filter, wrapper, and
nonlinear methods.
embedded methods.

4. Requires domain knowledge Can be applied to raw data


and feature engineering. without feature engineering.

Can improve the model’s Can improve the model


5. interpretability and reduce performance and handle nonlinear
overfitting. relationships.
May lose some information May introduce some noise and
6. and introduce bias if the redundancy if the extracted
wrong features are selected. features are not informative.

Brainstorming

Definition of Brainstorming in Data Analytics:


Brainstorming in data analytics is a collaborative or individual process of generating creative
ideas and insights to solve analytical problems, design models, or derive actionable insights from
data. This technique helps in identifying key metrics, patterns, and innovative approaches to
tackle complex [Link] in data analytics is crucial for problem-solving and
innovation. By defining objectives, leveraging diverse techniques, and structuring the session,
teams can generate impactful ideas that lead to actionable insights. With tools and frameworks,
brainstorming becomes a systematic approach to unlocking the full potential of data.

Steps in Brainstorming for Data Analytics

1. Define the Objective:


○ Clearly state the goal of the brainstorming session.
○ Example: "How can we predict customer churn in an e-commerce platform?"
2. Gather the Team (if collaborative):
○ Include data scientists, business analysts, domain experts, and stakeholders.
○ Diverse perspectives lead to richer ideas.
3. Understand the Data:
○ Discuss the type of data available (structured, unstructured, etc.).
○ Review the data sources, quality, and any preprocessing requirements.
4. Explore Analytical Questions:
○ Frame open-ended questions to encourage creative solutions.
○ Example: "What factors could influence customer churn?"
5. Use Tools for Idea Generation:
○ Employ whiteboards, sticky notes, or mind-mapping software.
○ For virtual sessions, tools like Miro, Trello, or Google Jamboard can help.
6. Group Similar Ideas:
○ Cluster related ideas to identify common themes or approaches.
○ Example: Group ideas into data-driven solutions (e.g., machine learning models)
and customer-focused strategies (e.g., personalized offers).
7. Prioritize Ideas:
○ Assess feasibility, potential impact, and resource requirements.
○ Use techniques like a priority matrix to rank ideas.
8. Draft an Action Plan:
○ Convert the best ideas into actionable steps.
○ Assign responsibilities and set timelines for implementation.

Techniques for Effective Brainstorming in Data Analytics

1. SWOT Analysis:
○ Identify Strengths, Weaknesses, Opportunities, and Threats of an analytical
project.
2. SCAMPER Technique:
○ Substitute, Combine, Adapt, Modify, Put to another use, Eliminate, and Reverse
aspects of an idea to improve it.
3. 5 Whys Analysis:
○ Keep asking "Why?" to dig deeper into the problem.
4. Affinity Diagramming:
○ Organize ideas into categories based on relationships.
5. Fishbone Diagram:
○ Identify possible causes of a specific problem.

Brainstorming Example in Data Analytics

Scenario:
A retail company wants to increase sales during the festive season.

1. Objective:
"What data-driven strategies can we use to boost festive season sales?"
2. Generated Ideas:
○ Analyze past sales data to identify popular products.
○ Use sentiment analysis on social media to predict trending items.
○ Develop a recommendation engine for personalized offers.
○ Segment customers based on purchasing behavior.
○ Optimize pricing strategies using A/B testing.
3. Clustering:
○ Group ideas into:
■ Predictive Analytics: Recommendation engines, customer segmentation.
■ Sentiment Analysis: Social media trends.
■ Pricing Strategies: A/B testing.
4. Prioritization:
○ Focus on high-impact, low-cost solutions like customer segmentation and
recommendation engines.
5. Action Plan:
○ Preprocess past sales data and train a recommendation model.
○ Launch a pilot program for personalized offers during the upcoming sale.

Best Practices for Brainstorming in Data Analytics

● Encourage Diverse Perspectives: Different viewpoints can highlight unique insights.


● Focus on Data and Objectives: Ensure ideas are data-driven and align with business
goals.
● Iterate Often: Regular brainstorming refines ideas and adapts to new findings.
● Document Everything: Record all ideas for reference and future exploration.

Role of domain expertise and Place for imagination

The Role of Domain Expertise and Imagination in Data Analytics

Data analytics is not just about crunching numbers and creating models; it also requires
contextual understanding (domain expertise) and creativity (imagination) to derive meaningful
insights and innovate [Link] expertise ensures that data analytics is grounded in
real-world relevance and accuracy, while imagination opens the door to innovation and
unconventional solutions. When combined, they enable the discovery of actionable insights, the
design of effective models, and the development of impactful strategies. For maximum success
in data analytics, fostering a balance between these elements is essential.

Let’s explore their roles in detail:

1. Role of Domain Expertise in Data Analytics

What is Domain Expertise?

Domain expertise refers to in-depth knowledge about a specific field or industry, such as
healthcare, finance, retail, or engineering. It helps in understanding the context of the data and
making informed decisions.

Importance of Domain Expertise:

1. Data Understanding and Interpretation:


○ Domain experts help identify relevant variables and metrics.
○ Example: In healthcare, they can clarify the significance of lab test results and
how they relate to patient outcomes.
2. Defining the Problem Statement:
○ Experts frame problems in ways that align with business or research goals.
○ Example: A marketing analyst can define a customer segmentation problem by
identifying critical customer behaviors.
3. Feature Selection:
○ Domain expertise aids in selecting or engineering meaningful features for models.
○ Example: In finance, knowing that "interest rates" and "credit scores" are critical
predictors of loan defaults.
4. Validating Insights:
○ They assess whether analytical insights make sense in the real-world context.
○ Example: Identifying whether a detected trend in sales is due to seasonality or
genuine consumer behavior changes.
5. Guiding Decision-Making:
○ Ensures that analytical results align with industry practices.
○ Example: In manufacturing, predicting machine failures should account for safety
regulations and operational workflows.

Challenges Without Domain Expertise:

● Misinterpretation of data.
● Overlooking critical factors or introducing irrelevant ones.
● Solutions that lack practicality or impact.

2. Role of Imagination in Data Analytics

What is Imagination in Data Analytics?

Imagination refers to the ability to think creatively and envision possibilities that may not be
immediately apparent. It drives innovation and helps explore unconventional solutions.

Importance of Imagination:

1. Hypothesis Generation:
○ Imagining potential relationships or trends in data.
○ Example: "Could weather patterns influence online shopping behavior?"
2. Innovative Approaches to Problems:
○ Thinking beyond traditional methods to solve problems.
○ Example: Using social media sentiment analysis to predict stock market trends.
3. Data Visualization and Storytelling:
○ Designing intuitive visuals to communicate complex data insights effectively.
○ Example: Creating a heatmap to display customer churn hotspots geographically.
4. Scenario Analysis:
○ Exploring "what-if" scenarios for predictive modeling.
○ Example: "What if the interest rates increase by 2% next quarter? How will loan
approvals change?"
5. Uncovering Hidden Patterns:
○ Imagining scenarios where seemingly unrelated data points might correlate.
○ Example: Finding a connection between customer reviews and return rates using
text analytics.
6. Building Intuitive Models:
○ Innovating features or architectures for machine learning models.
○ Example: Developing a custom recommendation system that adapts to seasonal
trends dynamically.

Challenges Without Imagination:

● Rigid thinking leads to missed opportunities.


● Lack of innovation in problem-solving.
● Difficulty in handling ambiguous or incomplete data.

3. Integration of Domain Expertise and Imagination

For successful data analytics, both domain expertise and imagination must work together:

Case Study Example: Predicting Employee Attrition

1. Domain Expertise:
○ HR specialists identify key factors like job satisfaction, salary, and work-life
balance.
○ They interpret metrics such as employee engagement scores and promotion
histories.
2. Imagination:
○ Analysts imagine new ways to assess attrition risk, such as analyzing sentiment in
employee emails or workplace surveys.
○ Creative visualizations (e.g., decision trees) to show at-risk employee profiles.
3. Combined Outcome:
○ The final model combines validated metrics with innovative features to predict
attrition accurately.

4. Best Practices for Leveraging Both

1. Collaborative Teams:
○ Include both domain experts and imaginative thinkers in teams to balance
practicality and creativity.
2. Encourage Cross-Disciplinary Learning:
○ Train domain experts in basic analytics.
○ Expose data analysts to domain-specific knowledge.
3. Iterative Feedback Loops:
○ Use feedback from domain experts to refine imaginative ideas and vice versa.
4. Use Tools for Creative Exploration:
○ Tools like brainstorming sessions, mind maps, and scenario planning encourage
imaginative thinking.
5. Test Hypotheses Rigorously:
○ Validate imaginative ideas with data and domain insights to ensure they are
realistic.

Feature Selection Algorithms


Feature selection is a way of selecting the subset of the most relevant features from the original
features set by removing the redundant, irrelevant, or noisy features.
While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant. If we input the dataset
with all these redundant and irrelevant features, it may negatively impact and reduce the overall
performance and accuracy of the model. Hence it is very important to identify and select the
most appropriate features from the data and remove the irrelevant or less important features,
which is done with the help of feature selection in machine learning.

Feature selection is one of the important concepts of machine learning, which highly impacts the
performance of the model. As machine learning works on the concept of "Garbage In Garbage
Out", we always need to input the most appropriate and relevant dataset to the model in order to
get a better result.

In this topic, we will discuss different feature selection techniques for machine learning. But
before that, let's first understand some basics of feature selection.

○ What is Feature Selection?


○ Need for Feature Selection
○ Feature Selection Methods/Techniques
○ Feature Selection statistics

What is Feature Selection?


A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection. Each machine
learning process depends on feature engineering, which mainly contains two processes; which
are Feature Selection and Feature Extraction. Although feature selection and extraction processes
may have the same objective, both are completely different from each other. The main difference
between them is that feature selection is about selecting the subset of the original feature set,
whereas feature extraction creates new features. Feature selection is a way of reducing the input
variable for the model by using only relevant data in order to reduce overfitting in the model.

So, we can define feature Selection as, "It is a process of automatically or manually selecting the
subset of most appropriate and relevant features to be used in model building." Feature selection
is performed by either including the important features or excluding the irrelevant features in the
dataset without changing them.

Need for Feature Selection


Before implementing any technique, it is really important to understand, need for the technique
and so for the Feature Selection. As we know, in machine learning, it is necessary to provide a
pre-processed and good input dataset in order to get better outcomes. We collect a huge amount
of data to train our model and help it to learn better. Generally, the dataset consists of noisy data,
irrelevant data, and some part of useful data. Moreover, the huge amount of data also slows down
the training process of the model, and with noise and irrelevant data, the model may not predict
and perform well. So, it is very necessary to remove such noises and less-important data from the
dataset and to do this, and Feature selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose we want to
create a model that automatically decides which car should be crushed for a spare part, and to do
this, we have a dataset. This dataset contains a Model of the car, Year, Owner's name, Miles. So,
in this dataset, the name of the owner does not contribute to the model performance as it does not
decide if the car should be crushed or not, so we can remove this column and select the rest of
the features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

○ It helps in avoiding the curse of dimensionality.


○ It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
○ It reduces the training time.
○ It reduces overfitting hence enhances the generalization.

Feature Selection Techniques


There are mainly two types of Feature Selection techniques, which are:

○ Supervised Feature Selection technique


Supervised Feature selection techniques consider the target variable and can be used for
the labelled dataset.
○ Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be used for
the unlabelled dataset.

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search problem, in
which different combinations are made, evaluated, and compared with other combinations. It
trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this feature
set, the model has trained again.

Some techniques of wrapper methods are:

○ Forward selection - Forward selection is an iterative process, which begins with an empty
set of features. After each iteration, it keeps adding on a feature and evaluates the
performance to check whether it is improving the performance or not. The process
continues until the addition of a new variable/feature does not improve the performance
of the model.
○ Backward elimination - Backward elimination is also an iterative approach, but it is the
opposite of forward selection. This technique begins the process by considering all the
features and removes the least significant feature. This elimination process continues
until removing the features does not improve the performance of the model.
○ Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force. It means this method
tries & make each possible combination of features and return the best performing feature
set.
○ Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach, where features
are selected by recursively taking a smaller and smaller subset of features. Now, an
estimator is trained with each set of features, and the importance of each feature is
determined using coef_attribute or through a feature_importances_attribute.

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does not
depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model by
using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

○ Information Gain
○ Chi-square Test
○ Fisher's Score
○ Missing Value Ratio
Information Gain: Information gain determines the reduction in entropy while transforming the
dataset. It can be used as a feature selection technique by calculating the information gain of
each variable with respect to the target variable.

Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.

Fisher's Score:

Fisher's score is one of the popular supervised technique of features selection. It returns the rank
of the variable on the fisher's criteria in descending order. Then we can select the variables with a
large fisher's score.

Missing Value Ratio:

the value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations. The variable is having more
than the threshold value can be dropped.

3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by considering
the interaction of features along with low computational cost. These are fast processing methods
similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the most
important features that contribute the most to training in a particular iteration. Some techniques
of embedded methods are:

○ Regularization- Regularization adds a penalty term to different parameters of the machine


learning model for avoiding overfitting in the model. This penalty term is added to the
coefficients; hence it shrinks some coefficients to zero. Those features with zero
coefficients can be removed from the dataset. The types of regularization techniques are
L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization).
○ Random Forest Importance - Different tree-based methods of feature selection help us
with feature importance to provide a way of selecting features. Here, feature importance
specifies which feature has more importance in model building or has a great impact on
the target variable. Random Forest is such a tree-based method, which is a type of
bagging algorithm that aggregates a different number of decision trees. It automatically
ranks the nodes by their performance or decrease in the impurity (Gini impurity) over all
the trees. Nodes are arranged as per the impurity values, and thus it allows to pruning of
trees below a specific node. The remaining nodes create a subset of the most important
features.

How to choose a Feature Selection Method?


For machine learning engineers, it is very important to understand that which feature selection
method will work properly for their model. The more we know the datatypes of variables, the
easier it is to choose the appropriate statistical measure for feature selection.
To know this, we need to first identify the type of input and output variables. In machine
learning, variables are of mainly two types:

○ Numerical Variables: Variable with continuous values such as integer, float


○ Categorical Variables: Variables with categorical values such as Boolean, ordinal,
nominals.

1. Numerical Input, Numerical Output:

Numerical Input variables are used for predictive regression modelling. The common method to
be used for such a case is the Correlation coefficient.

○ Pearson's correlation coefficient (For linear Correlation).


○ Spearman's rank coefficient (for non-linear correlation).

2. Numerical Input, Categorical Output:

Numerical Input with categorical output is the case for classification predictive modelling
problems. In this case, also, correlation-based techniques should be used, but with categorical
output.

○ ANOVA correlation coefficient (linear).


○ Kendall's rank coefficient (nonlinear).
3. Categorical Input, Numerical Output:

This is the case of regression predictive modelling with categorical input. It is a different
example of a regression problem. We can use the same measures as discussed in the above case
but in reverse order.

4. Categorical Input, Categorical Output:

This is a case of classification predictive modelling with categorical Input variables.

The commonly used technique for such a case is Chi-Squared Test. We can also use Information
gain in this case.

We can summarise the above cases with appropriate measures in the below table:

Input Variable Output Variable Feature Selection


technique

Numerical Numerical ○ Pearson's


correlation
coefficient (For
linear
Correlation).
○ Spearman's
rank coefficient
(for non-linear
correlation).

Numerical Categorical ○ ANOVA


correlation
coefficient
(linear).
○ Kendall's rank
coefficient
(nonlinear).
Categorical Numerical ○ Kendall's rank
coefficient
(linear).
○ ANOVA
correlation
coefficient
(nonlinear).

Categorical Categorical ○ Chi-Squared


test
(contingency
tables).
○ Mutual
Information.

You might also like