Sweep Strategies

📖 What am I looking at? (click to expand)

Each row is one (market, model, feature_set, slot, side, outcome, stake_algo, cfg) — a single deployable strategy. Multiple cutoffs (c1/c2/c3/c4 + c8) test it on different OOS windows; per-cutoff drill-down opens on row click.

Badges (in order of importance):

★ TRUST — The authoritative deploy gate. Sweep gates pass AND live paper-trade data confirms (gap < 5pp on n≥20, pnl% > -2%). Today: typically 0 because we just discovered 6/7 deployed strategies fail this check (sweep over-promised by 15-79pp). New candidates need ≥20 settled bets agreeing with sweep before earning this badge. Never deploy real money on a non-TRUST strategy.
DEPLOY — model pkl exists for every cutoff and feature_set is one portfolio_runtime knows. Live-deployable in the technical sense (code can load it), but ⚠ NOT a guarantee of profitability — see TRUST.
✗ FAIL — has live data, but live-vs-sweep gap exceeds 5pp threshold or pnl% < -2%. Sweep claimed positive, live shows otherwise. Do not deploy.
ALL+ — positive PnL on every cutoff that ran. Strong signal.
MULTI — positive on BOTH Jan-Feb (c1-c4) AND Mar-Apr (c8) OOS windows. Two independent test periods, not one. Strongest signal — only fires once c8 has populated.
↑MONO — ROI improves monotonically with more training data (c1 ≤ c2 ≤ c3 ≤ c4 within 5pp slack). Hardest signal to fake — strategy is genuinely learning.
CAL — well-calibrated (mean ECE < 0.05) AND positive closing-line value AND ≥30 CLV samples. The calibration-first gate the new methodology selects on — a model can have mediocre accuracy and still be very profitable if it's well-calibrated; the converse bleeds money. Only fires once the post-2026-05-11 backtest re-runs populate the metrics.
Robust (column, not badge) — Robustness score = composite × family_positive_rate × log(family_size). Penalizes lone outliers, rewards strategies whose sibling configs are also profitable. The default sort. Singletons (family_size=1) score 0 — there's no robustness signal possible from a single config. Higher = stronger. Added 2026-05-14 (R&D §28).
Family (column, not badge) — A family is every strategy sharing the same (market × model × feature_set × slot × side × outcome). Members differ only in stake_algo + edge_min + confidence_band. The cell shows positive/total with a colored dot: 🟢 ≥70% positive (robust signal — works across many configs), 🟡 ≥40% (mixed), 🔴 below (likely overfit outlier). Hover for median + min/max PnL across the family. If your top strategy has Family 1/324 it's almost certainly an overfit lucky outlier; if it's 8/12, the underlying signal is real.
⚠CON — engine reported conservation-of-money violation. DO NOT deploy. Indicates a bug in the backtest, not real edge.

Columns:

Composite — ranking metric: bonferroni_t × log(n_bets) × (1 − worst_drawdown). Halved if not all-positive. Higher = more credible.
OOS PnL — total P&L summed across all OOS cutoffs (€10K starting bankroll per cutoff).
ROI — mean ROI across cutoffs (PnL / total stake).
n_bets — total bets across all cutoffs. Higher = more statistical power.
t-stat — Bonferroni-adjusted t-statistic. Penalised by total-grid-size to control false positives. Rule of thumb: t > 3 is "probably real."
DD — worst drawdown across cutoffs. Lower is better; portfolio sim will compound.
REPL — N positive cutoffs / N total cutoffs. With c1-c4 + c8 = 5 max. Higher = more replications.
ECE — mean expected-calibration-error across cutoffs (lower = better calibrated; well-calibrated H2H ≈ 0.01-0.03, OU ≈ 0.02-0.05). The primary selection signal. Green < 0.05, amber 0.05-0.10, red above. '—' until the post-2026-05-11 re-runs land. Expand a row for per-cutoff ECE/Brier.
CLV — n-weighted mean closing-line value of the placed bets, in probability units (1/close − 1/entry for BACK, flipped for LAY) vs the T-10 price. > 0 = beat the close — the recognised proxy for a genuine edge. Strategies that bet AT the close show '—'.
LIVE % — paper-trade PnL as % of €10K initial bankroll (all-time, refreshed every 30min). Hover for 30d window + raw numbers. Only deployed strategies populate this (most rows show '—').
gap — realized winrate − predicted winrate. Negative = model over-predicts. Red if < -5pp on n≥15: deploy-killer — sweep was over-optimistic for this strategy. The ground-truth check on whether sweep claims match reality.

Cross-window stability (hidden but in MongoDB):

c1-c4 all test on Jan-Feb 2026 (varying train sizes). c8 tests on Mar-Apr 2026 (different OOS). A strategy positive on BOTH windows = real walk-forward signal, not just one lucky period. Look for "n_windows_positive: 2" in the per-cutoff drill-down.

How to pick a real-money portfolio: filter ★ Trustworthy deploy only. That's the only filter that requires live data to confirm sweep claims. Without that confirmation, sweep alone has been shown to over-promise by 15-79pp on real deployments. If TRUST count is 0, no strategy has earned real-money status yet — keep paper trading. To find candidates to send to paper trading: filter Deployable + Multi-window survivors (or All-positive while c8 is still populating), sort by composite, take top 10-30. Those go to shadow tier; once they accumulate ≥20 confirming bets, they become TRUST.

Deployable only All-positive only Multi-window survivors (c1-c4 + c8) ★ Trustworthy deploy only

Loading…

Strategy	Flags	Robust	Family	Composite	OOS PnL	ROI	n_bets	t-stat	DD	REPL	ECE	CLV	LIVE %	gap