Outliers are known to have negative effects in a Machine Learning model. Unless they are fundamentally a part of the model and its context, it should be avoided and maybe removed from the data entirely.
Using a quantile threshold
One possible way to deal with outliers is to establish a quantile threshold, also known as Tukey’s fences. One could use Interquartile Range or come up with a limit of his own.
Notice the difference of scales in the x-axis of both plots.
Using cleanlab Outlier Removal
Another great way is to use cleanlab’s OutOfDistribution. Behind the scenes, simply put, is just a K-Nearest Neighbors algorithm that detects those whose distance is greater than expected. Is most commonly used with Data-Centric AI methods.
Notice the distribution has not changed much, compared to the quantile method, but some data was lost.