What is it?

Class imbalance is when some classes in the dataset are heavily more prevalent than others. These can happens because of the nature of the problem, or maybe bad data acquisition. This is a problem in the training data itself, not the model, which then justifies using Mutual Information and Data-Centric AI methods for combating it.

So, for example, fraud detection tasks normally present very imbalanced classes in the dataset, because of the nature of the problem, where fraudulent transactions are way less common than normal ones.

Still, class imbalance is different from Underperforming Subpopulations. The latter focuses on the features of a dataset and how they affect the model, where class imbalance focuses on the distribution of the target variable.