What is it?

K-Means is a Machine Learning algorithm used in Unsupervised Learning and centroid-based Clustering tasks. It can be easily implemented with scikit-learn.

How does it work?

The K-Means algorithm need the number of centroids before initiating the learning phase, meaning that it should be decided by the data scientist, either directly or using another algorithm to optimally retrieve a possible best number of centroids.

What are centroids?

A centroid is coordinate where the cluster will be centered upon, meaning that all data points will be calculated in reference to the centroid. It’s also considered the mean of the cluster, once that K-Means place the output centroids as the average of the distance between all instances from the cluster

The algorithm

It’s also possible to preemptively determine the initial position of the centroids, or to randomly let the algorithm decide.

Each data point is assigned to each cluster considering the nearest centroid, and the centroid is recalculated. Then, if no centroid moved, then the algorithm has reached convergence, and the clusters are formed.

Pseudo algorithm

Select and initialise K centroids

Repeat:
	For each data point:
							Assign to the nearest cluster

	For each centroid:
		Update coordinates, depending on the mean distance

	Have centroids converged?
		Yes: stop iterating
		No: keep going

🍁Lucas' Garden

Explorer

K-Means

What is it?

How does it work?

The algorithm

Pseudo algorithm

Graph View

Table of Contents

Backlinks