Sweep Strategies

πŸ“– What am I looking at? (click to expand)

Each row is one (market, model, feature_set, slot, side, outcome, stake_algo, cfg) β€” a single deployable strategy. Multiple cutoffs (c1/c2/c3/c4 + c8) test it on different OOS windows; per-cutoff drill-down opens on row click.

Badges (in order of importance):

  • β˜… TRUST β€” The authoritative deploy gate. Sweep gates pass AND live paper-trade data confirms (gap < 5pp on nβ‰₯20, pnl% > -2%). Today: typically 0 because we just discovered 6/7 deployed strategies fail this check (sweep over-promised by 15-79pp). New candidates need β‰₯20 settled bets agreeing with sweep before earning this badge. Never deploy real money on a non-TRUST strategy.
  • DEPLOY β€” model pkl exists for every cutoff and feature_set is one portfolio_runtime knows. Live-deployable in the technical sense (code can load it), but ⚠ NOT a guarantee of profitability β€” see TRUST.
  • βœ— FAIL β€” has live data, but live-vs-sweep gap exceeds 5pp threshold or pnl% < -2%. Sweep claimed positive, live shows otherwise. Do not deploy.
  • ALL+ β€” positive PnL on every cutoff that ran. Strong signal.
  • MULTI β€” positive on BOTH Jan-Feb (c1-c4) AND Mar-Apr (c8) OOS windows. Two independent test periods, not one. Strongest signal β€” only fires once c8 has populated.
  • ↑MONO β€” ROI improves monotonically with more training data (c1 ≀ c2 ≀ c3 ≀ c4 within 5pp slack). Hardest signal to fake β€” strategy is genuinely learning.
  • CAL β€” well-calibrated (mean ECE < 0.05) AND positive closing-line value AND β‰₯30 CLV samples. The calibration-first gate the new methodology selects on β€” a model can have mediocre accuracy and still be very profitable if it's well-calibrated; the converse bleeds money. Only fires once the post-2026-05-11 backtest re-runs populate the metrics.
  • Robust (column, not badge) β€” Robustness score = composite Γ— family_positive_rate Γ— log(family_size). Penalizes lone outliers, rewards strategies whose sibling configs are also profitable. The default sort. Singletons (family_size=1) score 0 β€” there's no robustness signal possible from a single config. Higher = stronger. Added 2026-05-14 (R&D Β§28).
  • Family (column, not badge) β€” A family is every strategy sharing the same (market Γ— model Γ— feature_set Γ— slot Γ— side Γ— outcome). Members differ only in stake_algo + edge_min + confidence_band. The cell shows positive/total with a colored dot: 🟒 β‰₯70% positive (robust signal β€” works across many configs), 🟑 β‰₯40% (mixed), πŸ”΄ below (likely overfit outlier). Hover for median + min/max PnL across the family. If your top strategy has Family 1/324 it's almost certainly an overfit lucky outlier; if it's 8/12, the underlying signal is real.
  • ⚠CON β€” engine reported conservation-of-money violation. DO NOT deploy. Indicates a bug in the backtest, not real edge.

Columns:

  • Composite β€” ranking metric: bonferroni_t Γ— log(n_bets) Γ— (1 βˆ’ worst_drawdown). Halved if not all-positive. Higher = more credible.
  • OOS PnL β€” total P&L summed across all OOS cutoffs (€10K starting bankroll per cutoff).
  • ROI β€” mean ROI across cutoffs (PnL / total stake).
  • n_bets β€” total bets across all cutoffs. Higher = more statistical power.
  • t-stat β€” Bonferroni-adjusted t-statistic. Penalised by total-grid-size to control false positives. Rule of thumb: t > 3 is "probably real."
  • DD β€” worst drawdown across cutoffs. Lower is better; portfolio sim will compound.
  • REPL β€” N positive cutoffs / N total cutoffs. With c1-c4 + c8 = 5 max. Higher = more replications.
  • ECE β€” mean expected-calibration-error across cutoffs (lower = better calibrated; well-calibrated H2H β‰ˆ 0.01-0.03, OU β‰ˆ 0.02-0.05). The primary selection signal. Green < 0.05, amber 0.05-0.10, red above. 'β€”' until the post-2026-05-11 re-runs land. Expand a row for per-cutoff ECE/Brier.
  • CLV β€” n-weighted mean closing-line value of the placed bets, in probability units (1/close βˆ’ 1/entry for BACK, flipped for LAY) vs the T-10 price. > 0 = beat the close β€” the recognised proxy for a genuine edge. Strategies that bet AT the close show 'β€”'.
  • LIVE % β€” paper-trade PnL as % of €10K initial bankroll (all-time, refreshed every 30min). Hover for 30d window + raw numbers. Only deployed strategies populate this (most rows show 'β€”').
  • gap β€” realized winrate βˆ’ predicted winrate. Negative = model over-predicts. Red if < -5pp on nβ‰₯15: deploy-killer β€” sweep was over-optimistic for this strategy. The ground-truth check on whether sweep claims match reality.

Cross-window stability (hidden but in MongoDB):

c1-c4 all test on Jan-Feb 2026 (varying train sizes). c8 tests on Mar-Apr 2026 (different OOS). A strategy positive on BOTH windows = real walk-forward signal, not just one lucky period. Look for "n_windows_positive: 2" in the per-cutoff drill-down.

How to pick a real-money portfolio: filter β˜… Trustworthy deploy only. That's the only filter that requires live data to confirm sweep claims. Without that confirmation, sweep alone has been shown to over-promise by 15-79pp on real deployments. If TRUST count is 0, no strategy has earned real-money status yet β€” keep paper trading. To find candidates to send to paper trading: filter Deployable + Multi-window survivors (or All-positive while c8 is still populating), sort by composite, take top 10-30. Those go to shadow tier; once they accumulate β‰₯20 confirming bets, they become TRUST.

Loading…
StrategyFlagsRobustFamilyCompositeOOS PnLROIn_betst-statDDREPLECECLVLIVE %gap