What is it?

In Machine Learning, entropy measures the level of disorder or uncertainty of a given dataset. A higher entropy corresponds to mixed and messy data, where you don’t have a clear separation of data. On the other hand, low entropy shows a more clear and organized separation of data.

The concept of entropy comes from 19th century thermodynamics, introduced by Ludwig Boltzmann, who statistically showed how the entropy of an isolated system can never decrease with time.

Entropy was formerly a measure of chaos of particles in a system, but nowadays, it is widely used in Machine Learning, especially in Decision Trees and Neural Networks.


Entropy in Machine Learning

When used in Machine Learning, entropy stand together with Information theory, where it takes a new meaning.

In this context, it refers to the amount of data and uncertainty present in a variable, or when applied, the amount of information learned on average from one instance of the variable. It’s given by the formula:

Given a discrete random variable , with possible outcomes with probability of occurring , and as the number of possible categories.


  • When working with feature engineering, one might use entropy for exploring Mutual Information of features.

  • Or maybe, use entropy as the cost-function of Decision Trees, splitting based on the purity of data determined by entropy.


References