To SMOTE or not to SMOTE?

Summary of the paper https://arxiv.org/pdf/2201.08528.pdf

This paper discusses the effect of the balancing techniques (SMOTE, under/over sampling) for imbalanced samples.

When the objective metric is proper: A metric is proper when it is optimized by a classifier predicting the true class probabilities. For example, it is easy to see that Brier score is proper and even though AUC is generally not proper under the i.i.d assumption it is proper.
- One can empirically show that balancing could improve prediction performance for weak classifiers such as MLP, SVM, decision tree, Adaboost and LGBM but not for the SOTA classifiers (XGBoost and Catboost). The strong classifiers (without balancing) yield better prediction quality than the weak classifiers with balancing.
When the objective is a label metric:
- Fixed threshold:
  - balancing considerably improved prediction performance for all classifiers.
- Optimized threshold:
  - strong and medium classifiers: Balancing and optimizing the decision threshold provide similar prediction quality. However, optimizing the decision threshold is recommended due to simplicity and lower compute cost.
  - very weak classifiers (MLP and SVM): balancing the data is significantly beneficial over the optimizing the decision threshold. Nevertheless, the resulting prediction quality will be significantly worse compared to using a strong classifier (without oversampling).
- When balancing (instead of optimizing the decision threshold) SMOTE-like methods were not significantly better than the simple random oversampler.