Sweep Strategies
π What am I looking at? (click to expand)
Each row is one (market, model, feature_set, slot, side, outcome, stake_algo, cfg) β a single deployable strategy. Multiple cutoffs (c1/c2/c3/c4 + c8) test it on different OOS windows; per-cutoff drill-down opens on row click.
Badges (in order of importance):
- β TRUST β The authoritative deploy gate. Sweep gates pass AND live paper-trade data confirms (gap < 5pp on nβ₯20, pnl% > -2%). Today: typically 0 because we just discovered 6/7 deployed strategies fail this check (sweep over-promised by 15-79pp). New candidates need β₯20 settled bets agreeing with sweep before earning this badge. Never deploy real money on a non-TRUST strategy.
- DEPLOY β model pkl exists for every cutoff and feature_set is one portfolio_runtime knows. Live-deployable in the technical sense (code can load it), but β NOT a guarantee of profitability β see TRUST.
- β FAIL β has live data, but live-vs-sweep gap exceeds 5pp threshold or pnl% < -2%. Sweep claimed positive, live shows otherwise. Do not deploy.
- ALL+ β positive PnL on every cutoff that ran. Strong signal.
- MULTI β positive on BOTH Jan-Feb (c1-c4) AND Mar-Apr (c8) OOS windows. Two independent test periods, not one. Strongest signal β only fires once c8 has populated.
- βMONO β ROI improves monotonically with more training data (c1 β€ c2 β€ c3 β€ c4 within 5pp slack). Hardest signal to fake β strategy is genuinely learning.
- CAL β well-calibrated (mean ECE < 0.05) AND positive closing-line value AND β₯30 CLV samples. The calibration-first gate the new methodology selects on β a model can have mediocre accuracy and still be very profitable if it's well-calibrated; the converse bleeds money. Only fires once the post-2026-05-11 backtest re-runs populate the metrics.
- Robust (column, not badge) β Robustness score = composite Γ family_positive_rate Γ log(family_size). Penalizes lone outliers, rewards strategies whose sibling configs are also profitable. The default sort. Singletons (family_size=1) score 0 β there's no robustness signal possible from a single config. Higher = stronger. Added 2026-05-14 (R&D Β§28).
- Family (column, not badge) β A family is every strategy sharing the same (market Γ model Γ feature_set Γ slot Γ side Γ outcome). Members differ only in
stake_algo+edge_min+confidence_band. The cell showspositive/totalwith a colored dot: π’ β₯70% positive (robust signal β works across many configs), π‘ β₯40% (mixed), π΄ below (likely overfit outlier). Hover for median + min/max PnL across the family. If your top strategy has Family 1/324 it's almost certainly an overfit lucky outlier; if it's 8/12, the underlying signal is real. - β CON β engine reported conservation-of-money violation. DO NOT deploy. Indicates a bug in the backtest, not real edge.
Columns:
- Composite β ranking metric:
bonferroni_t Γ log(n_bets) Γ (1 β worst_drawdown). Halved if not all-positive. Higher = more credible. - OOS PnL β total P&L summed across all OOS cutoffs (β¬10K starting bankroll per cutoff).
- ROI β mean ROI across cutoffs (PnL / total stake).
- n_bets β total bets across all cutoffs. Higher = more statistical power.
- t-stat β Bonferroni-adjusted t-statistic. Penalised by total-grid-size to control false positives. Rule of thumb: t > 3 is "probably real."
- DD β worst drawdown across cutoffs. Lower is better; portfolio sim will compound.
- REPL β N positive cutoffs / N total cutoffs. With c1-c4 + c8 = 5 max. Higher = more replications.
- ECE β mean expected-calibration-error across cutoffs (lower = better calibrated; well-calibrated H2H β 0.01-0.03, OU β 0.02-0.05). The primary selection signal. Green < 0.05, amber 0.05-0.10, red above. 'β' until the post-2026-05-11 re-runs land. Expand a row for per-cutoff ECE/Brier.
- CLV β n-weighted mean closing-line value of the placed bets, in probability units (
1/close β 1/entryfor BACK, flipped for LAY) vs the T-10 price. > 0 = beat the close β the recognised proxy for a genuine edge. Strategies that bet AT the close show 'β'. - LIVE % β paper-trade PnL as % of β¬10K initial bankroll (all-time, refreshed every 30min). Hover for 30d window + raw numbers. Only deployed strategies populate this (most rows show 'β').
- gap β realized winrate β predicted winrate. Negative = model over-predicts. Red if < -5pp on nβ₯15: deploy-killer β sweep was over-optimistic for this strategy. The ground-truth check on whether sweep claims match reality.
Cross-window stability (hidden but in MongoDB):
c1-c4 all test on Jan-Feb 2026 (varying train sizes). c8 tests on Mar-Apr 2026 (different OOS). A strategy positive on BOTH windows = real walk-forward signal, not just one lucky period. Look for "n_windows_positive: 2" in the per-cutoff drill-down.
How to pick a real-money portfolio: filter β Trustworthy deploy only. That's the only filter that requires live data to confirm sweep claims. Without that confirmation, sweep alone has been shown to over-promise by 15-79pp on real deployments. If TRUST count is 0, no strategy has earned real-money status yet β keep paper trading. To find candidates to send to paper trading: filter Deployable + Multi-window survivors (or All-positive while c8 is still populating), sort by composite, take top 10-30. Those go to shadow tier; once they accumulate β₯20 confirming bets, they become TRUST.
| Strategy | Flags | Robust | Family | Composite | OOS PnL | ROI | n_bets | t-stat | DD | REPL | ECE | CLV | LIVE % | gap |
|---|