Understanding Clustering Algorithms

The document discusses clustering, focusing on the K-Means and Hierarchical clustering algorithms. K-Means is an iterative method that groups data points based on their proximity to cluster centers, while Hierarchical clustering creates a tree-like structure of nested clusters. The document also covers distance measures and the importance of similarity in clustering results.

Uploaded by

Deepak Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views39 pages

Understanding Clustering Algorithms

Uploaded by

Deepak Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Clustering

• Basic idea: group together similar instances

• Example: 2D point patterns
Clustering
• Basic idea: group together similar instances
• Example: 2D point patterns
Clustering
• Basic idea: group together similar instances
• Example: 2D point patterns

• What could similar mean?

– One option: small Euclidean distance (squared)
dist(�x, �y ) = ||�x − �y ||22
– Clustering results are crucially dependent on the measure of
similarity (or distance) between “points” to be clustered
• /(&'+'0-(0+"+"*,'(%-.$
– 1,%%,.#23 +**",.&'+%(4&
– 5,26,7)3 6(4($(4&

Clustering algorithms
• 8+'%(%(,)+"*,'(%-.$9:"+%;
– <Ͳ.&+)$
– =(>%#'&,?@+#$$(+)
– A2&0%'+"!"#$%&'()*

!"#$%&'()*+"*,'(%-.$
• /(&'+'0-(0+"+"*,'(%-.$
– 1,%%,.#23 +**",.&'+%(4&
– 5,26,7)3 6(4($(4&

• 8+'%(%(,)+"*,'(%-.$9:"+%;
– <Ͳ.&+)$
Clustering examples
Image segmenta3on
Goal: Break up the image into meaningful or
perceptually similar regions

[Slide from James Hayes]

K-Means
• An iterative clustering
algorithm

– Initialize: Pick K random

points as cluster centers

– Alternate:
1. Assign data points to
closest cluster center
2. Change the cluster
center to the average
of its assigned points

– Stop when no points

assignments change
K-Means
• An iterative clustering
algorithm

– Initialize: Pick K random

points as cluster centers

– Alternate:
1. Assign data points to
closest cluster center
2. Change the cluster
center to the average
of its assigned points

– Stop when no points

assignments change
K-‐means clustering: Example

• Pick K random
points as cluster
centers (means)

Shown here for K=2

11
K-‐means clustering: Example
Iterative Step 1
• Assign data points to
closest cluster center

12
K-‐means clustering: Example
Iterative Step 2
• Change the cluster
center to the average of
the assigned points

13
K-‐means clustering: Example
• Repeat unDl
convergence

14
K-‐means clustering: Example

15
K-‐means clustering: Example

16
K-‐means clustering: Example

17
ProperDes of K-‐means algorithm
• Guaranteed to converge in a ﬁnite number of
iteraDons

• Running Dme per iteraDon:

1. Assign data points to closest cluster center
O(KN) time
2. Change the cluster center to the average of its
assigned points
O(N)
What properDes should a distance
!"#$%&'%(&$)(**"'+,-#-)*$#./(
measure have?
0(#*+&("#1(2
• 3400($&)/
– 56789:;56987:
– <$"(&=)*(8=(/#.*#47,''>*,)>(9?+$9-'(*.'$,''>
,)>(7
• @'*)$)1)$48#.-*(,AͲ*)0),#&)$4
– 56789:൒B8#.-56789:;B)AA 7;9
– <$"(&=)*($"(&(=),,-)AA(&(.$'?C(/$*$"#$=(/#..'$$(,,
#%#&$
• D&)#.E,().(F+#,)$4
– 56789:G5698H:൒ 5678H:
– <$"(&=)*('.(/#.*#4I7)*,)>(989)*,)>(H8?+$7)*.'$
,)>(H#$#,,J

[Slide from Alan Fern]

K-means: Idea
• K clusters each summarized by a prototype µk .
• Assignment of data xi to a cluster represented by
K
X
responsibilities rik ∈ {0, 1} with rik = 1.
k =1
• An example with 4 data points and 3 clusters.
 
1 0 0
 0 0 1 
(rik ) = 
 0

1 0 
0 0 1

n X
X K
• Loss function J = rik kxi − µk k22 .
i =1 k =1