0% found this document useful (0 votes)

288 views10 pages

Unsupervised Learning and Clustering Techniques

This document discusses unsupervised machine learning techniques including clustering, association, and hierarchical clustering. Clustering algorithms are used to group unlabeled data based on similarities, including k-means clustering and fuzzy c-means clustering. Association rules are used to find relationships in large datasets. Hierarchical clustering creates tree-like structures to organize data.

Uploaded by

Ananya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

288 views10 pages

Unsupervised Learning and Clustering Techniques

Uploaded by

Ananya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unsupervised Machine Learning

unsupervised learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the
given data.

“Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.”

The goal of unsupervised learning is to find the underlying structure of dataset, group
that data according to similarities, and represent that dataset in a compressed
format.

Why use Unsupervised Learning?

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.

Types of Unsupervised Learning algorithms:

Clustering:

• Clustering is the process of dividing the datasets into groups, consisting

of similar data points.

• It means grouping of objects based on information found in the data,

describing the objects or then relationship.

Association:

• Used for finding the relationships between variables in the large database.

• Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have
labeled input data.

• Unsupervised learning is preferable as it is easy to get unlabeled data in

comparison to labeled data.

Disadvantages of Unsupervised Learning

• Unsupervised learning is more difficult than supervised learning as it
does not have corresponding output.

• The result of the unsupervised learning algorithm might be less accurate

as input data is not labeled, and algorithms do not know the exact output
in advance.
Clustering in Machine Learning

• Clustering is the task of dividing the data points into a number of

groups, such that data points in the same groups are more similar to each
other and dissimilar to the data points in other groups.
• It is basically a collection of objects on the basis of similarity and
dissimilarity between them.

Why clustering is important?

• Through the use of clusters, It is very easy to sort data and analyze
specific groups.
• Clustering enables businesses to approach customer segments
differently based on their attributes and similarities. This helps in
maximizing profits.
• It can help in dimensionality reduction if the dataset is comprised
of too many variables. Irrelevant clusters can be identified easier
and removed from the dataset.
Where it is used?
• City Planning: It is used to make groups of houses and to study their
values based on their geographical locations and other factors present.

• Earthquake studies: By learning the earthquake-affected areas we can

determine the dangerous zones.

• Image Processing: Clustering can be used to group similar images

together, classify images based on content, and identify patterns in
image data.

• Manufacturing: Clustering is used to group similar products together,

optimize production processes, and identify defects in manufacturing
processes.

• Medical diagnosis: Clustering is used to group patients with similar

symptoms or diseases, which helps in making accurate diagnoses and
identifying effective treatments.

• Fraud detection: Clustering is used to identify suspicious patterns or

anomalies in financial transactions, which can help in detecting fraud or
other financial crimes.

Types of Clustering:-

❖ Exclusive Clustering
• k-Means Clustering
❖ Overlapping Clustering
• Fuzzy c-Means Clustering
❖ Hierarchical Clustering

Exclusive Clustering:-
• Exclusive clustering, also known as hard clustering, is a type of clustering
in unsupervised machine learning where each data point is assigned to
exactly one cluster.
• In other words, there is a clear and exclusive assignment of each data
point to a single cluster, and no overlapping memberships are allowed.

• The most well-known exclusive clustering algorithm is k-means.

k-Means Clustering:-
• K-means clustering is a popular unsupervised machine learning
algorithm used for partitioning a dataset into a set of groups.

• There is no overlapping of subgroups or clusters.

• The algorithm's objective is to group data points into k clusters, where

each data point belongs to the cluster.

• Here K defines the number of pre-defined clusters that need to be

created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.

• It is a centroid-based algorithm, where each cluster is associated with a

centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.

How does the K-Means Algorithm Work?

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).

Step-3: Calculate the distance between each data point and [Link]
each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Recalculate each cluster center by taking the average of cluster’s data
point.

Step-5: Repeat from step2 to step5 until the recalculated cluster centers are
same as previous or No reassignment of data points happend.

How to decide the number of clusters?

Elbow Method :
• The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster.
• The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

2
CLuster3 distance(Pi C3)

• In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same
for the other two terms.

• To measure the distance between data points and centroid, we can use
any method such as Euclidean distance .

• To find the optimal value of clusters, the elbow method follows the
below steps:
(1) It executes the K-means clustering on a given dataset for different K
values (ranges from 1-10).

(2) For each value of K, calculates the WCSS value.

(3) Plots a curve between calculated WCSS values and the number of
clusters K.

(4) The sharp point of bend or a point of the plot looks like an arm, then
that point is considered as the best value of K.

• Since the graph shows the sharp bend, which looks like an elbow, hence it
is known as the elbow method.

Overlapping Clustering:-
• Overlapping clustering, also known as soft clustering ,it’s a type of
clustering in which a data point can belong to more than one cluster.

• In traditional (non-overlapping) clustering, each data point is assigned to

exactly one cluster. However, in overlapping clustering, a data point may
have membership in more than one cluster, indicating that it exhibits
characteristics of multiple clusters.

Fuzzy C-Means Clustering:-

• Fuzzy C-Means (FCM) clustering is a type of unsupervised machine
learning algorithm used for clustering, and it's an extension of the classic
K-Means algorithm.
• The key difference between K-Means and FCM lies in the assignment of
data points to clusters. In K-Means, each data point is assigned to a
single cluster, while in FCM data point to belong to more than one
cluster with different degrees of membership.

• fuzzy clustering assigns a membership degree between 0 and 1 for each

data point for each cluster.

Advantages of Fuzzy Clustering:-

• Flexibility: Fuzzy clustering allows for overlapping clusters, which can

be useful when the data has a complex structure.

• Interpretability: Fuzzy clustering provides a more detailed

representation of the relationships between data points and clusters.

Disadvantages of Fuzzy Clustering:-

• Complexity: Fuzzy clustering algorithms can be computationally more
expensive than traditional clustering algorithms.

Hierarchical Clustering:-
• Hierarchical clustering is a type of clustering algorithm that organizes
data points into a tree-like structure, known as a dendrogram.

• The basic idea behind hierarchical clustering is to build a hierarchy of

clusters, where clusters at one level of the hierarchy are formed by
merging or splitting clusters at the preceding level.

Advantages:-
• Hierarchy Representation: This hierarchical structure can be useful for
understanding the organization of the data.

• No Need for Specify Number of Clusters:

Disadvantages:
• Computational Complexity
• Sensitive to Noise
• Difficulty in Handling Large Datasets

Association in Machine Learning

• Association in unsupervised machine learning generally refers to
the discovery of interesting relationships, patterns, or associations
within a dataset without predefined labels or outcomes.
Applications of Association:
• Market Basket Analysis:
Discovering relationships between products that are frequently
purchased together.
• Healthcare Data Analysis:
Identifying associations between symptoms and diseases.

• Fraud Detection:
Discovering unusual patterns of transactions that may indicate
fraudulent activity.
Pros and Cons:
Pros:
• Discover Hidden Patterns:
Association rule can reveal hidden patterns or relationships within
the data.
• Applicability:
Used in various domain, including retail, healthcare, finance etc.
Cons:
• Data Quality:
Sensitive to noise and irrelevant information in the dataset.
• Scalability:
Computationally expensive for large datasets.

Common questions

The primary advantage of clustering is its capability to sort data into groups of similar data points, facilitating data analysis and targeted business strategies. It assists in dimensionality reduction by identifying and removing irrelevant clusters. Clustering finds applications in various domains such as image processing, medical diagnosis, and fraud detection, by grouping similar entities . However, its disadvantages include high computational complexity, sensitivity to noise, and difficulty in handling large datasets .

The Elbow method finds the optimal number of clusters by executing K-means clustering for various values of K and calculating the WCSS (Within Cluster Sum of Squares) for each K . It plots these WCSS values against the number of clusters, creating a curve. The optimal cluster number is identified at the sharp bend, resembling an elbow. At this point, increasing the number of clusters yields diminishing returns in terms of WCSS reduction, indicating a proper trade-off between accuracy and complexity .

Association rules in market basket analysis identify items that frequently appear together in transactions, like bread and butter . These insights enable retailers to optimize shelf layouts, tailor promotions, and boost cross-selling strategies by aligning product placements to anticipated customer buying patterns, enhancing both sales volumes and customer satisfaction .

Exclusive clustering, or hard clustering, assigns each data point to a single, distinct cluster, which is beneficial for clear and non-overlapping group distinctions, as in k-means clustering . Overlapping clustering, or soft clustering, allows data points to belong to multiple clusters, useful for complex datasets where entities may share characteristics with multiple groups, such as fuzzy c-means clustering. These distinctions are critical in environments like genetic data analysis, where relationships aren't strictly binary .

Fuzzy clustering, offering overlapping membership of data points, introduces complexity beyond traditional clustering methods like k-means, as it involves calculating degrees of membership for each data point across multiple clusters . While this complexity allows more accurate modeling of real-world data where entities can exhibit multiple affiliations, it also results in higher computational costs and intricate interpretation, necessitating a balance between computational resources and the need for nuanced data analysis .

Unsupervised learning is more applicable in settings where labeled data is scarce or non-existent, making it ideal for exploratory data analysis and discovering hidden patterns without predefined outcomes, such as in market basket analysis . However, challenges include the difficulty in assessing the accuracy of results due to a lack of labels and the typically higher complexity compared to supervised learning, as models must infer structure without explicit outputs .

Hierarchical clustering builds a tree-like structure called a dendrogram by recursively merging or splitting clusters based on similarity or distance measures . This approach offers advantages like not requiring an upfront specification of the number of clusters, which is advantageous for exploratory data analysis. Its hierarchical nature provides a clear representation of data groups and insights into the relationships between clusters, useful for understanding complex data hierarchies in biological data analysis .

In medical diagnosis, clustering is used to group patients with similar symptoms or diseases into clusters, facilitating more accurate diagnoses and effective treatment plans . By analyzing the characteristics shared within clusters, medical practitioners can identify patterns and correlations in symptoms and outcomes, leading to more personalized and targeted healthcare interventions .

Unsupervised learning is likened to human learning as it involves uncovering patterns and insights without explicit instructions or labeled data, similar to how humans learn from experiences . This form of learning is crucial for real-world applications where labeled inputs and corresponding outputs are often unavailable. Its ability to discover useful insights without predefined targets allows it to address complex tasks and adapt to new environments, thereby making it essential for applications in which obtaining labeled data is impractical .

Unsupervised learning significantly contributes to the development of real AI, as it mimics human-like intelligence by learning from unstructured data without explicit guidance, fostering adaptive and self-improving systems . Its role in future technologies lies in enabling machines to autonomously interpret complex datasets across diverse domains, paving the way for advancements in fields like robotics, personalized medicine, and automated decision-making systems, where adaptable and nuanced understanding of data is critical .

Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
23 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
19 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
31 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
99 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
110 pages
Unsupervised Learning: Clustering Basics
No ratings yet
Unsupervised Learning: Clustering Basics
20 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
74 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
18 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
20 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
9 pages
Introduction to Clustering in ML
No ratings yet
Introduction to Clustering in ML
11 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
96 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
13 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
51 pages
Centroid vs Medoid in Clustering
No ratings yet
Centroid vs Medoid in Clustering
53 pages
Understanding K-Means Clustering Basics
No ratings yet
Understanding K-Means Clustering Basics
6 pages
Hierarchical Reinforcement Learning Overview
No ratings yet
Hierarchical Reinforcement Learning Overview
32 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
29 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
14 pages
K-Means Clustering and Distance Measures
No ratings yet
K-Means Clustering and Distance Measures
66 pages
Understanding Clustering in Unsupervised Learning
No ratings yet
Understanding Clustering in Unsupervised Learning
18 pages
Disadvantages of K-Means Clustering
No ratings yet
Disadvantages of K-Means Clustering
59 pages
Clustering and Ensemble Methods Overview
No ratings yet
Clustering and Ensemble Methods Overview
28 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
5 pages
Understanding Clustering in Unsupervised Learning
No ratings yet
Understanding Clustering in Unsupervised Learning
19 pages
Understanding Clustering Algorithms in ML
No ratings yet
Understanding Clustering Algorithms in ML
47 pages
Unsupervised Learning Algorithms Explained
No ratings yet
Unsupervised Learning Algorithms Explained
15 pages
Genetic K-Means Clustering Explained
No ratings yet
Genetic K-Means Clustering Explained
47 pages
Understanding Clustering in Unsupervised Learning
No ratings yet
Understanding Clustering in Unsupervised Learning
14 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
82 pages
Fuzzy Clustering Techniques Overview
No ratings yet
Fuzzy Clustering Techniques Overview
6 pages
Unsupervised Machine Learning Overview
No ratings yet
Unsupervised Machine Learning Overview
18 pages
Unsupervised Learning: K-Means & DBSCAN
No ratings yet
Unsupervised Learning: K-Means & DBSCAN
61 pages
Unsupervised Learning: K-Means & Clustering
No ratings yet
Unsupervised Learning: K-Means & Clustering
125 pages
Understanding Clustering in Data Analysis
No ratings yet
Understanding Clustering in Data Analysis
16 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
22 pages
Understanding Unsupervised Clustering Techniques
No ratings yet
Understanding Unsupervised Clustering Techniques
35 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
25 pages
Classification vs Clustering Explained
No ratings yet
Classification vs Clustering Explained
31 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
19 pages
Unsupervised Learning Techniques Explained
No ratings yet
Unsupervised Learning Techniques Explained
42 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
10 pages
Unsupervised Learning & K-Means Explained
No ratings yet
Unsupervised Learning & K-Means Explained
25 pages
PCA Variants in Unsupervised Learning
No ratings yet
PCA Variants in Unsupervised Learning
42 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
10 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
49 pages
Understanding Clustering Methods and Applications
No ratings yet
Understanding Clustering Methods and Applications
14 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
96 pages
Unsupervised Learning & Clustering Explained
No ratings yet
Unsupervised Learning & Clustering Explained
34 pages
K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
11 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
38 pages
Unsupervised Learning: Clustering Models
No ratings yet
Unsupervised Learning: Clustering Models
38 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
56 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
9 pages
Unsupervised Learning: Clustering & PCA
No ratings yet
Unsupervised Learning: Clustering & PCA
16 pages
Unsupervised Learning in Neural Networks
No ratings yet
Unsupervised Learning in Neural Networks
21 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
22 pages
Unsupervised Learning & Clustering Techniques
No ratings yet
Unsupervised Learning & Clustering Techniques
6 pages
Lenovo V15 G3 Laptop Specifications
No ratings yet
Lenovo V15 G3 Laptop Specifications
2 pages
Decision Control in Loops
No ratings yet
Decision Control in Loops
15 pages
MySQL Architecture Overview
No ratings yet
MySQL Architecture Overview
3 pages
Confusion Matrix and F1 Score Explained
No ratings yet
Confusion Matrix and F1 Score Explained
27 pages
DNS Server Setup for 10.0.0.1
No ratings yet
DNS Server Setup for 10.0.0.1
3 pages
AI's Impact on Computer Science
No ratings yet
AI's Impact on Computer Science
10 pages
Twitter Behavior in Saudi Arabia
No ratings yet
Twitter Behavior in Saudi Arabia
5 pages
Python Tools for Hacking & Security
No ratings yet
Python Tools for Hacking & Security
2 pages
ISA BrandingGuide
No ratings yet
ISA BrandingGuide
19 pages
Feature Engineering Techniques in ML
No ratings yet
Feature Engineering Techniques in ML
158 pages
Ramanujan's Master Theorem Explained
No ratings yet
Ramanujan's Master Theorem Explained
18 pages
SA-260MB/SA-300MB Autoclave Manual
No ratings yet
SA-260MB/SA-300MB Autoclave Manual
140 pages
EtherCap Rotating Data Link Datasheet
No ratings yet
EtherCap Rotating Data Link Datasheet
6 pages
Hibernate Framework Overview and Benefits
No ratings yet
Hibernate Framework Overview and Benefits
25 pages
Data Analyst Profile: Skills & Projects
No ratings yet
Data Analyst Profile: Skills & Projects
1 page
ACCEED 2102 Data Sheet Overview
100% (1)
ACCEED 2102 Data Sheet Overview
4 pages
Citra Canary 2503 Log Report
No ratings yet
Citra Canary 2503 Log Report
7 pages
GameCenter Initialization Log Analysis
No ratings yet
GameCenter Initialization Log Analysis
6 pages
Making Fields Read-Only in RESTful Apps
No ratings yet
Making Fields Read-Only in RESTful Apps
8 pages
110Vdc 2900W Rectifier Module Guide
No ratings yet
110Vdc 2900W Rectifier Module Guide
12 pages
Technical Support Agent at Gigmo Solutions
No ratings yet
Technical Support Agent at Gigmo Solutions
1 page
Creating Popup Menus in Excel
No ratings yet
Creating Popup Menus in Excel
9 pages
Learning User Profiles for Recommendations
No ratings yet
Learning User Profiles for Recommendations
3 pages
CF-LX3 EC Update to V2.00L12 Guide
No ratings yet
CF-LX3 EC Update to V2.00L12 Guide
9 pages
Hotel Management System in Python
No ratings yet
Hotel Management System in Python
3 pages
Binary Arithmetic Homework Guide
No ratings yet
Binary Arithmetic Homework Guide
2 pages
Icx C360exp Retail
100% (1)
Icx C360exp Retail
183 pages
Exporting Decks to AnkiDroid Guide
No ratings yet
Exporting Decks to AnkiDroid Guide
1 page
Detecting Dark Activities in AIS Data
No ratings yet
Detecting Dark Activities in AIS Data
6 pages
Multithreaded Social Media App Synopsis
No ratings yet
Multithreaded Social Media App Synopsis
9 pages