β οΈ Random 80/20 split β spatial autocorrelation inflates metrics
ECE by Eval Year (β better)
Brier Score by Eval Year (β better)
β
Expanding-window temporal validation β no future data leakage. Scenario chaining active for hβ₯2.
Random Split vs Temporal β ROC-AUC by Model
Calibration Comparison (ECE β)
Brier Score Comparison (β)
Key insight: Random-split AUC is inflated by 13-43pp due to spatial leakage. Temporal validation reveals true generalization.