- Learn more about the problem. Search for similar Kaggle competitions. Check the task in Papers with Code.
- Do a basic data exploration. Try to understand the problem and gather a sense of what can be important.
- Get baseline model working.
- Design an evaluation method as close as the final evaluation. Plot local evaluation metrics against the public ones (correlation) to validate how well your validation strategy works.
- Try different approaches for preprocessing (encodings, Deep Feature Synthesis, lags, aggregations, imputers, …). If you’re working as a group, split preprocessing feature generation between files.
- Plot learning curves (sklearn or external tools) to avoid overfitting.
- Plot real and predicted target distribution to see how well your model understand the underlying distribution. Apply any postprocessing that might fix small things.
- Tune hyper-parameters once you’ve settled on an specific approach ([hyperopt](target distribution), optuna).
- Plot and visualize the predictions (target vs predicted errors, histograms, random prediction, …) to make sure they’re doing as expected. Explain the predictions with SHAP.
- Think about what postprocessing heuristics can be done to improve or correct predictions.
- Stack classifiers (example).
- Try AutoML models.