Predictor — learning loop

Aratea is a weather-factor discovery engine. Every named feature here is a hypothesis; every training run measures whether it carries signal. The bench is the same row-set kalshi_mid Brier — beat the market, on its own ground.

Level 1 · PublicWhat Aratea thinks, in plain words Level 2 · InformedNamed components, simple stats Level 3 · ExpertFull registry, Brier, raw runs

Everything the manifest carries: named factors with their leave-one-out delta, paper-trade ledger, training runs and Brier trajectory. This is the meteorologist / actuary view — no rounding, no sugar-coating.

Features tracked

Active

Experimental

Dropped

Paper bets (open / resolved)

16 / 420

Phase 1: 436/50

Manifest generated at 2026-07-10T20:02:10Z (schema v3).

Hybrid effective sample (N_eff)

α = 0.3

500.4= 420 + 0.3 × 268 = 500.4

N_live (real paper trades)

420

N_backtest_strict (replay, point-in-time)

268

NAIVE-excluded (informational)

Phase 1 gate reached (live only) ✓

N_eff drives secondary decisions only — feature-set selection, reliability plots, complementary promotion check. The Phase 1 go/no-go gate stays strictly on N_live; backtest volume never substitutes for live trades there.

Read CONVENTION §6.bis

Series

Status

A. Live runs (Kalshi paper trades)

Each row is a real paper trade on Kalshi. The champion takes the position (real ledger row, real P&L); challengers and baselines run in shadow mode for Brier comparison. ★ marks the best Brier on a given run. The promotion rule (champion swap) needs a rolling-mean Brier dominance over N≥10 resolved trades — single-run wins are anecdotal.

Run	When	Event / Bin	Side	Champion p	Challenger p	Baseline p	kalshi_mid	Outcome	P&L paper
437	2026-07-10	LOWTBOS 11/7B66.5	YES	28.9%	31.2%	14.5%	14.5%	PENDING	—
436	2026-07-10	LOWTDC 11/7B71.5	NO	25.9%	8.7%	34.5%	34.5%	PENDING	—
435	2026-07-10	LOWTCHI 11/7B67.5	NO	29.7%	45.1%	43.5%	43.5%	PENDING	—
434	2026-07-10	LOWTCHI 11/7B63.5	YES	15.3%	21.6%	4.0%	4.0%	PENDING	—
433	2026-07-10	LOWTCHI 11/7B65.5	YES	37.2%	51.5%	16.0%	16.0%	PENDING	—
432	2026-07-10	LOWTSFO 11/7B53.5	YES	39.1%	19.6%	16.0%	16.0%	PENDING	—
431	2026-07-10	LOWTNYC 11/7B73.5	YES	37.9%	25.3%	22.5%	22.5%	PENDING	—
430	2026-07-10	LOWTNYC 11/7B69.5	NO	10.0%	2.2%	26.5%	26.5%	PENDING	—
429	2026-07-10	LOWTNYC 11/7B71.5	YES	57.1%	62.8%	28.5%	28.5%	PENDING	—
428	2026-07-09	LOWTCHI 10/7B69.5	YES	53.6%	88.5%	43.0%	43.0%	PENDING	—
427	2026-07-09	LOWTCHI 10/7B67.5	YES	27.5%	40.7%	11.5%	11.5%	PENDING	—
426	2026-07-09	LOWTSFO 10/7B56.5	YES	22.5%	9.8%	6.0%	6.0%	PENDING	—
425	2026-07-09	LOWTLAX 10/7B66.5	YES	30.4%	47.6%	19.5%	19.5%	PENDING	—
424	2026-07-09	LOWTNYC 10/7B71.5	NO	13.6%	9.6%	26.5%	26.5%	PENDING	—
423	2026-07-09	LOWTNYC 10/7B73.5	YES	50.9%	81.8%	31.0%	31.0%	PENDING	—
422	2026-07-09	LOWTNYC 10/7B69.5	NO	2.3%	5.1%	23.0%	23.0%	PENDING	—
421	2026-07-08	LOWTMIA 9/7B79.5	NO	6.3%B=0.0040	3.5%B=0.0012 ★	18.0%B=0.0324	18.0%	WIN (NO)	+$7.20
420	2026-07-08	LOWTMIA 9/7B83.5	YES	47.0%B=0.2209	85.3%B=0.7270	25.0%B=0.0625 ★	25.0%	LOSS (NO)	−$72.50
419	2026-07-08	LOWTDC 9/7B72.5	YES	32.2%B=0.1037	41.6%B=0.1734	14.5%B=0.0210 ★	14.5%	LOSS (NO)	−$72.65
418	2026-07-08	LOWTCHI 9/7B68.5	NO	12.5%B=0.0155	1.7%B=0.0003 ★	27.5%B=0.0756	27.5%	WIN (NO)	+$27.50
417	2026-07-08	LOWTCHI 9/7B72.5	YES	42.3%B=0.1787	25.1%B=0.0629	15.5%B=0.0240 ★	15.5%	LOSS (NO)	−$72.69
416	2026-07-08	LOWTSFO 9/7B56.5	YES	52.4%B=0.2748	45.5%B=0.2066	15.5%B=0.0240 ★	15.5%	LOSS (NO)	−$72.69
415	2026-07-08	LOWTSFO 9/7B54.5	NO	43.3%B=0.3210	25.4%B=0.5563	82.5%B=0.0306 ★	82.5%	LOSS (YES)	−$72.63
414	2026-07-08	LOWTLAX 9/7B65.5	YES	39.3%B=0.1542	25.7%B=0.0661 ★	31.5%B=0.0992	31.5%	LOSS (NO)	−$72.45
413	2026-07-08	LOWTLAX 9/7B63.5	NO	30.4%B=0.4851	12.1%B=0.7733	60.0%B=0.1600 ★	60.0%	LOSS (YES)	−$72.40
412	2026-07-08	LOWTNYC 9/7B69.5	NO	28.7%B=0.0822	9.7%B=0.0093 ★	37.5%B=0.1406	37.5%	WIN (NO)	+$43.50
411	2026-07-08	LOWTNYC 9/7B71.5	YES	43.4%B=0.3204 ★	31.1%B=0.4742	26.5%B=0.5402	26.5%	WIN (YES)	+$201.39
410	2026-07-07	LOWTCHI 8/7B61.5	NO	1.7%B=0.0003 ★	7.9%B=0.0062	7.5%B=0.0056	7.5%	WIN (NO)	+$4.20
409	2026-07-07	LOWTCHI 8/7B63.5	NO	1.4%B=0.0002 ★	7.6%B=0.0058	7.5%B=0.0056	7.5%	WIN (NO)	+$4.57
408	2026-07-07	LOWTCHI 8/7B67.5	YES	34.8%B=0.4252	57.5%B=0.1804 ★	22.5%B=0.6006	22.5%	WIN (YES)	+$194.53
407	2026-07-07	LOWTSFO 8/7B53.5	YES	22.7%B=0.5981 ★	9.8%B=0.8131	16.5%B=0.6972	16.5%	WIN (YES)	+$263.02
406	2026-07-07	LOWTNYC 8/7B65.5	NO	31.7%B=0.1004 ★	53.9%B=0.2909	37.5%B=0.1406	37.5%	WIN (NO)	+$33.75
405	2026-07-07	LOWTNYC 8/7B63.5	YES	39.7%B=0.3635	63.3%B=0.1345 ★	13.5%B=0.7482	13.5%	WIN (YES)	+$361.57
404	2026-07-06	LOWTBOS 7/7B62.5	YES	44.8%B=0.3045	63.9%B=0.1305 ★	37.5%B=0.3906	37.5%	WIN (YES)	+$15.62
403	2026-07-06	LOWTDC 7/7B73.5	YES	37.0%B=0.3963	56.2%B=0.1919 ★	31.0%B=0.4761	31.0%	WIN (YES)	+$140.76
402	2026-07-06	LOWTDC 7/7B71.5	NO	19.0%B=0.0363	16.3%B=0.0264 ★	31.0%B=0.0961	31.0%	WIN (NO)	+$28.21
401	2026-07-06	LOWTCHI 7/7B65.5	YES	49.3%B=0.2570	81.0%B=0.0362 ★	37.0%B=0.3969	37.0%	WIN (YES)	+$107.73
400	2026-07-06	LOWTCHI 7/7B63.5	NO	10.4%B=0.0107 ★	15.2%B=0.0231	23.5%B=0.0552	23.5%	WIN (NO)	+$19.27
399	2026-07-06	LOWTSFO 7/7B54.5	YES	32.0%B=0.4622 ★	12.1%B=0.7724	15.5%B=0.7140	15.5%	WIN (YES)	+$345.61
398	2026-07-06	LOWTSFO 7/7B56.5	NO	44.3%B=0.1958 ★	46.0%B=0.2115	81.5%B=0.6642	81.5%	WIN (NO)	+$278.73
397	2026-07-06	LOWTNYC 7/7B64.5	YES	37.0%B=0.1371	16.4%B=0.0270 ★	27.5%B=0.0756	27.5%	LOSS (NO)	−$63.25
396	2026-07-06	LOWTNYC 7/7B66.5	YES	54.1%B=0.2922	48.2%B=0.2322	26.5%B=0.0702 ★	26.5%	LOSS (NO)	−$63.34
395	2026-07-05	LOWTCHI 6/7B66.5	YES	60.9%B=0.3709	65.7%B=0.4314	51.0%B=0.2601 ★	51.0%	LOSS (NO)	−$59.67
394	2026-07-05	LOWTCHI 6/7B64.5	NO	13.4%B=0.0180	2.5%B=0.0006 ★	23.5%B=0.0552	23.5%	WIN (NO)	+$20.44
393	2026-07-05	LOWTSFO 6/7B56.5	NO	61.5%B=0.1484	63.7%B=0.1317	75.5%B=0.0600 ★	75.5%	LOSS (YES)	−$67.13
392	2026-07-05	LOWTSFO 6/7B54.5	YES	31.8%B=0.1013	10.8%B=0.0116	10.5%B=0.0110 ★	10.5%	LOSS (NO)	−$67.10
391	2026-07-05	LOWTLAX 6/7B62.5	YES	22.6%B=0.0509	5.4%B=0.0030 ★	10.5%B=0.0110	10.5%	LOSS (NO)	−$67.10
390	2026-07-05	LOWTNYC 6/7B65.5	YES	42.6%B=0.1812	24.3%B=0.0591 ★	26.0%B=0.0676	26.0%	LOSS (NO)	−$38.74
389	2026-07-05	LOWTNYC 6/7B67.5	YES	53.1%B=0.2823	46.0%B=0.2119	31.0%B=0.0961 ★	31.0%	LOSS (NO)	−$66.96
388	2026-07-04	LOWTMIA 5/7B76.5	NO	8.1%B=0.8450	1.5%B=0.9707	23.5%B=0.5852 ★	23.5%	LOSS (YES)	−$16.07
387	2026-07-04	LOWTMIA 5/7B80.5	YES	42.8%B=0.1832	25.5%B=0.0651	23.0%B=0.0529 ★	23.0%	LOSS (NO)	−$79.35
386	2026-07-04	LOWTDC 5/7B73.5	YES	28.8%B=0.0832	10.1%B=0.0102 ★	11.0%B=0.0121	11.0%	LOSS (NO)	−$79.53
385	2026-07-04	LOWTDC 5/7B75.5	YES	50.0%B=0.2504 ★	40.4%B=0.3548	31.0%B=0.4761	31.0%	WIN (YES)	+$176.64
384	2026-07-04	LOWTCHI 5/7B68.5	YES	39.2%B=0.1538	21.5%B=0.0462 ★	22.5%B=0.0506	22.5%	LOSS (NO)	−$79.42
383	2026-07-04	LOWTNYC 5/7B72.5	YES	54.4%B=0.2963	48.3%B=0.2330	25.5%B=0.0650 ★	25.5%	LOSS (NO)	−$79.31
382	2026-07-04	LOWTNYC 5/7B70.5	NO	9.3%B=0.0086	1.7%B=0.0003 ★	39.0%B=0.1521	39.0%	WIN (NO)	+$50.70
381	2026-07-04	LOWTNYC 5/7B74.5	YES	41.0%B=0.1681	23.6%B=0.0558	8.5%B=0.0072 ★	8.5%	LOSS (NO)	−$79.47
380	2026-07-03	LOWTNYC 4/7B79.5	YES	29.3%B=0.0859	11.9%B=0.0142 ★	20.5%B=0.0420	20.5%	LOSS (NO)	−$15.17
379	2026-07-03	LOWTNYC 4/7B77.5	YES	36.6%B=0.1339	18.3%B=0.0336 ★	24.0%B=0.0576	24.0%	LOSS (NO)	−$85.92
378	2026-07-02	HIGHTSFO 3/7B70.5	NO	14.9%B=0.7249	20.6%B=0.6303	34.0%B=0.4356 ★	34.0%	LOSS (YES)	−$47.52
377	2026-07-02	LOWTDEN 3/7B61.5	YES	23.0%B=0.0528	9.0%B=0.0082 ★	9.5%B=0.0090	9.5%	LOSS (NO)	−$88.83
376	2026-07-02	LOWTDEN 3/7B57.5	NO	11.1%B=0.0122	2.7%B=0.0007 ★	28.5%B=0.0812	28.5%	WIN (NO)	+$35.34
375	2026-07-02	LOWTPHX 3/7B75.5	YES	11.4%B=0.0129	16.4%B=0.0270	5.0%B=0.0025 ★	5.0%	LOSS (NO)	−$74.40
374	2026-07-02	LOWTPHX 3/7B77.5	NO	13.8%B=0.7435	21.4%B=0.6181	36.0%B=0.4096 ★	36.0%	LOSS (YES)	−$88.32
373	2026-07-02	LOWTMIA 3/7B80.5	YES	32.6%B=0.1066	53.4%B=0.2848	24.0%B=0.0576 ★	24.0%	LOSS (NO)	−$88.80
372	2026-07-02	LOWTMIA 3/7B76.5	YES	16.0%B=0.0257	2.1%B=0.0004 ★	7.5%B=0.0056	7.5%	LOSS (NO)	−$88.88
371	2026-07-02	LOWTBOS 3/7B75.5	NO	1.1%B=0.0001 ★	7.4%B=0.0054	7.0%B=0.0049	7.0%	WIN (NO)	+$6.65
370	2026-07-02	LOWTDC 3/7B79.5	YES	21.7%B=0.0470	5.2%B=0.0027 ★	10.5%B=0.0110	10.5%	LOSS (NO)	−$88.83
369	2026-07-02	LOWTDC 3/7B81.5	YES	57.4%B=0.3291	57.3%B=0.3282	32.0%B=0.1024 ★	32.0%	LOSS (NO)	−$88.64
368	2026-07-02	LOWTCHI 3/7B77.5	NO	20.9%B=0.0435	5.5%B=0.0030 ★	29.5%B=0.0870	29.5%	WIN (NO)	+$37.17
367	2026-07-02	LOWTCHI 3/7B71.5	YES	19.4%B=0.0375	4.4%B=0.0019 ★	8.0%B=0.0064	8.0%	LOSS (NO)	−$88.80
366	2026-07-02	LOWTLAX 3/7B57.5	YES	14.2%B=0.0203	2.6%B=0.0007 ★	3.5%B=0.0012	3.5%	LOSS (NO)	−$88.87
365	2026-07-02	LOWTLAX 3/7B59.5	YES	35.5%B=0.1260	15.0%B=0.0225 ★	17.0%B=0.0289	17.0%	LOSS (NO)	−$88.74
364	2026-06-30	HIGHTSFO 1/7B68.5	NO	16.1%B=0.0260 ★	16.9%B=0.0284	26.5%B=0.0702	26.5%	WIN (NO)	+$7.68
363	2026-06-30	HIGHTSFO 1/7B72.5	NO	9.7%B=0.0094 ★	14.5%B=0.0210	21.0%B=0.0441	21.0%	WIN (NO)	+$19.11
362	2026-06-30	LOWTDEN 1/7B54.5	NO	9.5%B=0.8190	13.7%B=0.7450	23.0%B=0.5929 ★	23.0%	LOSS (YES)	−$71.61
361	2026-06-30	LOWTDEN 1/7B58.5	YES	31.5%B=0.0993	46.5%B=0.2164	14.5%B=0.0210 ★	14.5%	LOSS (NO)	−$72.06
360	2026-06-30	LOWTPHX 1/7B72.5	YES	12.7%B=0.0162	16.3%B=0.0265	7.5%B=0.0056 ★	7.5%	LOSS (NO)	−$50.85
359	2026-06-30	LOWTPHX 1/7B74.5	YES	16.8%B=0.6930	23.5%B=0.5855 ★	7.5%B=0.8556	7.5%	WIN (YES)	+$889.85
358	2026-06-30	LOWTMIA 1/7B76.5	NO	14.9%B=0.0222	2.1%B=0.0004 ★	20.5%B=0.0420	20.5%	WIN (NO)	+$18.45
357	2026-06-30	LOWTMIA 1/7B80.5	YES	26.3%B=0.0694	52.0%B=0.2700	11.5%B=0.0132 ★	11.5%	LOSS (NO)	−$72.10
356	2026-06-30	LOWTBOS 1/7B68.5	YES	31.6%B=0.0998	41.6%B=0.1733	21.0%B=0.0441 ★	21.0%	LOSS (NO)	−$72.03
355	2026-06-30	LOWTBOS 1/7B70.5	YES	33.4%B=0.4434	55.1%B=0.2017 ★	20.5%B=0.6320	20.5%	WIN (YES)	+$279.84
354	2026-06-30	LOWTDC 1/7B70.5	NO	2.6%B=0.0007 ★	6.0%B=0.0036	13.0%B=0.0169	13.0%	WIN (NO)	+$10.66
353	2026-06-30	LOWTCHI 1/7B77.5	NO	20.7%B=0.6291	36.0%B=0.4099 ★	28.0%B=0.5184	28.0%	LOSS (YES)	−$72.00
352	2026-06-30	LOWTLAX 1/7B57.5	NO	1.0%B=0.0001 ★	2.5%B=0.0006	7.5%B=0.0056	7.5%	WIN (NO)	+$5.85
351	2026-06-30	LOWTLAX 1/7B59.5	NO	6.4%B=0.0041	5.0%B=0.0025 ★	15.5%B=0.0240	15.5%	WIN (NO)	+$13.17
350	2026-06-30	LOWTNYC 1/7B70.5	YES	10.7%B=0.0114	12.1%B=0.0147	3.0%B=0.0009 ★	3.0%	LOSS (NO)	−$71.61
349	2026-06-30	LOWTNYC 1/7B74.5	NO	29.4%B=0.4980	43.4%B=0.3204 ★	39.5%B=0.3660	39.5%	LOSS (YES)	−$72.00
348	2026-06-28	HIGHTSFO 29/6B72.5	NO	12.1%B=0.0147 ★	18.5%B=0.0344	33.5%B=0.1122	33.5%	WIN (NO)	+$9.71
347	2026-06-28	HIGHTSFO 29/6B74.5	NO	12.2%B=0.7716	18.8%B=0.6600	38.5%B=0.3782 ★	38.5%	LOSS (YES)	−$62.73
346	2026-06-28	LOWTSEA 29/6B48.5	YES	7.9%B=0.0062	9.3%B=0.0087	2.5%B=0.0006 ★	2.5%	LOSS (NO)	−$43.45
345	2026-06-28	LOWTSEA 29/6B52.5	YES	40.0%B=0.3602	52.2%B=0.2285 ★	31.5%B=0.4692	31.5%	WIN (YES)	+$137.00
344	2026-06-28	LOWTSEA 29/6B50.5	YES	19.9%B=0.0398	17.8%B=0.0317	6.0%B=0.0036 ★	6.0%	LOSS (NO)	−$63.06
343	2026-06-28	LOWTDEN 29/6B59.5	YES	40.4%B=0.1633	26.5%B=0.0705	6.0%B=0.0036 ★	6.0%	LOSS (NO)	−$63.06
342	2026-06-28	LOWTPHX 29/6B78.5	YES	23.4%B=0.5874	30.3%B=0.4863 ★	13.5%B=0.7482	13.5%	WIN (YES)	+$403.95
341	2026-06-28	LOWTMIA 29/6B81.5	NO	49.0%B=0.2404	39.5%B=0.1556 ★	62.0%B=0.3844	62.0%	WIN (NO)	+$102.92
340	2026-06-28	LOWTDC 29/6B68.5	YES	19.7%B=0.0389	4.4%B=0.0019 ★	7.5%B=0.0056	7.5%	LOSS (NO)	−$63.07
339	2026-06-28	LOWTCHI 29/6B74.5	YES	33.5%B=0.1119	13.4%B=0.0178 ★	22.5%B=0.0506	22.5%	LOSS (NO)	−$63.00
338	2026-06-28	LOWTCHI 29/6B76.5	YES	51.2%B=0.2377 ★	42.6%B=0.3294	27.5%B=0.5256	27.5%	WIN (YES)	+$166.02

★ = best Brier this run · B = Brier score per model · P&L = champion only (challengers and baselines are shadow; no real exposure).

B. Named factors

Each row is a named hypothesis used by the learned predictor at training time. Brier Δ is the leave-one-out test delta from the most recent training run — sort by it to see what carried the model.

Name	Hypothesis	Source	Added	Brier Δ ↓	Status
p_ensemble	Mean of four vendor probabilities (ECMWF + GraphCast + GFS + JMA). Hypothesis: vendor disagreement washes out, the mean is the wisest single bet. (Bench 2026-05-11 N=138: ensemble Brier 0.1429 vs kalshi_mid 0.0845 — the average lost to the market, so we need to learn weights instead of averaging blindly.)	derived from `predictors/ensemble.py`	2026-05-09	↑ +0.0041	active
forecast_spread	Max − min of the per-vendor probabilities (proxy of model disagreement). Hypothesis: when vendors disagree, the prediction is less trustworthy and the market mid carries more weight than the model.	derived from `predictions.ensemble.inputs.individual_probs`	2026-05-09	↑ +0.0033	active
p_climatology	Historical base rate of (variable in [lower, upper]) over the same date-of-year window from the past 15 years. The dumb-but-honest prior every forecast must beat.	derived from `predictors/climatology.py` (Open-Meteo historical)	2026-05-09	↑ +0.0015	experimental
urban_density_5km	OSM `way["building"]` count within 5 km of the station. Hypothesis: urban heat island raises overnight lows above what a non-urban climatology predicts → biases low-temp markets in cities. Units: building count (not %-area; see README for why).	OSM Overpass API	2026-05-11	↑ +0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
elevation_m	USGS EPQS elevation at the station point. Hypothesis: thinner air at altitude amplifies the diurnal swing (Denver KDEN ~1638 m vs. Miami KMIA ~2 m at the extremes of our station set).	https://epqs.nationalmap.gov/v1/json	2026-05-11	↑ +0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
latitude	Station latitude (degrees, signed). Hypothesis: insolation, daylight length, and seasonal amplitude scale with `cos(latitude)` — explicit feature lets the learner discover the interaction with the date-of-year encoded in climatology.	NWS_STATIONS table	2026-05-11	↑ +0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
forest_pct_5km	OSM `natural=wood` + `landuse=forest` feature count within 5 km. Hypothesis: canopy cover lowers daytime highs (shade + evapotranspiration) and limits radiative night cooling (canopy traps). Units: feature count.	OSM Overpass API	2026-05-11	↑ +0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
days_ahead	Days between snapshot and target_date. Hypothesis: forecast skill decays with horizon, learned weights should interact non-linearly with this.	derived from `predictions.forecast_blend.inputs.days_ahead`	2026-05-09	· ±0.0000	experimental
water_pct_10km	OSM `natural=water` + `waterway=*` feature count within 10 km. Hypothesis: large water bodies dampen diurnal swings via thermal inertia → tightens the [lower, upper] hit probability for both highs and lows. Units: feature count (kept the `_pct_` name from the spec for continuity).	OSM Overpass API	2026-05-11	↓ −0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
distance_to_coast_km	Haversine distance to the nearest Natural Earth 1:50m coastline vertex. Hypothesis: maritime regime (Boston, Miami, SFO) damps extremes; continental regime (Denver, Oklahoma City) amplifies them.	Natural Earth `ne_50m_coastline.geojson`	2026-05-11	↓ −0.0000	dropped (v3, 2026-06-05 — noise as additive linear term)
p_consensus	Mean of the three correlated probability views (`p_climatology` + `p_forecast_blend` + `p_ensemble`). Hypothesis: those three estimate the same P(YES) by different routes and are near-collinear; under L2 the learner splits one signal across three compensating coefficients (the +1.07 / -0.87 / -0.40 pattern measured on the v2 run). Collapsing them into their mean keeps the shared signal on one stable coefficient, with the orthogonal disagreement axis carried by `forecast_spread`. Standard mean+spread reparametrisation of a collinear block.	derived from `predictors/{climatology,forecast_blend,ensemble}.py`	2026-06-05	↓ −0.0011	experimental
p_forecast_blend	Open-Meteo deterministic forecast around target_date, blended with climatology by horizon. Hypothesis: state-of-art deterministic forecast carries calibrated short-horizon signal.	derived from `predictors/forecast_blend.py`	2026-05-09	↓ −0.0021	active
p_nws_ndfd	P(YES) computed from the NWS NDFD official forecast, gaussian around NDFD temp with sigma from climatology range. Hypothesis: the agency that resolves Kalshi weather markets (NWS Climatological Report Daily) should issue the highest-signal forecast available.	https://api.weather.gov	2026-05-11	TBD (forward-only — no historical coverage yet)	experimental
series_bias_prior	Known mean bias (p_consensus − y) per series_ticker over 61-date backfill. Hypothesis: each Kalshi weather series has a stable series-specific intercept (KXHIGHTSFO −0.090 to BOS/LAX ~0); this continuous prior generalises `is_hightemp` without per-series dummy variables. Expected coef ≈ −1.	backfill_dataset analysis (B24)	2026-06-21	TBD (v3b run, pending HOLDOUT > 20 dates)	experimental
forecast_revision	Change in p_consensus between earliest and latest capture of the same ticker. Hypothesis: drift velocity of the consensus toward YES/NO encodes atmospheric persistence; complementary to the level (p_consensus) and the horizon decay (days_ahead).	derived via dataset.annotate_revision_drift() across multi-day forward captures (B23)	2026-06-21	TBD (v4, pending multi-capture pipeline)	experimental
p_consensus_x_series_bias_fa	Interaction p_consensus × series_bias_fa. Hypothesis: bias correction should scale with confidence level — when p_consensus is high and series overestimates, the error is larger. Tested B38 2026-06-21: NO-GO (VALID p=0.912, 3/12 dates, Brier worse than incumbent).	derived from p_consensus × series_bias_fa	2026-06-21	+0.0002 (VALID, worse)	dropped (v3fb NO-GO, 2026-06-21)
days_ahead_x_series_bias_fa	Interaction days_ahead × series_bias_fa. Hypothesis: per-series calibration bias scales with forecast horizon — longer horizons may amplify series-specific miscalibration. Tested B38 2026-06-21: NO-GO (VALID p=0.633, 6/12 dates, tie).	derived from days_ahead × series_bias_fa	2026-06-21	0.0000 (VALID, tie)	dropped (v3fb NO-GO, 2026-06-21)

Click a row for the full hypothesis, source link, and per-run history. Brier Δ is the leave-one-out test-Brier delta from the latest run — negative (↓) = feature carried signal, positive (↑) = net noise on this split.

C. Latest training run

Snapshot of the most recent sklearn fit of the learned predictor on historical resolutions. This is not a paper trade — it's a cross-validation pass to see whether the current feature set has edge over kalshi_mid on past Kalshi events.

Latest training run

feature set v3 · 2026-06-05 12:34 UTC

MARKET WINS

n_train

144

n_test

Brier train

0.1368

Brier test

0.1359

Brier kalshi_mid

0.1098

gap (test − kalshi_mid)

+0.0261

log-loss test

0.4368

log-loss kalshi_mid

0.3611

D. Training run history

Every sklearn training pass, most recent first. This is not the paper-trade history — see section A for that. A training run with Brier test below Brier kalshi_mid on the same rows means the model has signal beyond the market mid in cross-validation.

When (UTC)	Feature set	n_test	Brier test	Brier kalshi_mid	Gap	Verdict	Notes
2026-06-05 12:34 UTC	v3	84	0.1359	0.1098	+0.0261	MARKET WINS	—
2026-05-14 19:19 UTC	v2	84	0.1323	0.1305	+0.0018	MARKET WINS	Phase A.2 rerun under clean target_date split; supersedes invalidated 20260514T141925Z. Decision-gate run for the temporal-split fix.
2026-05-12 13:45 UTC	v2	65	0.1301	0.0764	+0.0537	MARKET WINS	schema v2: add intercept + feature_means/stds for live inference
2026-05-11 13:08 UTC	v2	42	0.1260	0.0282	+0.0978	MARKET WINS	First A.3 discovery run. V2 = V0 baseline + 6 static geographic features (urban_density_5km, water_pct_10km, forest_pct_5km, elevation_m, distance_to_coast_km, latitude). N=138 resolved, train=96 (older), test=42 (newer). Test slice currently collapses to a single capture date (20260510T171217Z) — limits temporal variance; geographic deltas mostly null on this split as expected.

E. Training Brier trajectory

Learned model (test) vs kalshi_mid (same test rows) across all training runs. Dashed horizontal line is the most recent kalshi_mid Brier as all-time reference; vertical dashed markers flag a feature-set bump (v0 → v1 → v2 …).

F. Backtest replays

Replayed records produced by backtest.py against settled Kalshi events. Only strict point-in-time records count toward N_backtest_strict; NAIVE-mode rows are flagged and excluded from the hybrid sample. Filters above narrow both this table and the live runs section.

No backtest replays in the manifest. The aggregate count may still be non-zero — per-record detail is omitted by the manifest builder when the ledger exceeds the inline budget.