What is it?i

In Machine Learning, agglomerative clustering is a Unsupervised Learning algorithm for hierarchical-based clustering, meaning that it builds a tree from the leaves, with each being a single data point, and then clusters these leaves until it reaches a single cluster, that is the root of the tree.

How does it work?

The algorithm supposes that all data points are clusters, and then, tries to merge them together until the desired number of clusters is reached. Each step merges the two nearest cluster to each other. However, various method can be used to determine the nearness of two clusters.

Metrics of nearness

Some metrics used to determine nearness, and available in scikit-learn are: Euclidean Distance, Manhattan Distance, cosine distance and a pre computed metric.

Linkage criterion\

To determine which clusters should be merged, scikit-learn uses the linkage criterion, which are methods that decide what’s the criteria used to minimize the selected distance metric, which can be:

Single

Uses the minimum of the distances between all points in two clusters, merging clusters with the the minimum value of the single criterion.
Complete

Uses the maximum distance between all points in two cluster, merging clusters with the the **minimum value of these maximum distances.
Average

Uses the average distance between points in clusters, merging clusters with the minimum value of averages.
Ward

Minimizes the Variance of points in two clusters, merging clusters which will have the maximum impact on minimizing the Variance.

🍁Lucas' Garden

Explorer

Agglomerative Clustering

What is it?i

How does it work?

Metrics of nearness

Linkage criterion\

Single

Complete

Average

Ward

Graph View

Table of Contents

Backlinks