Question 1

What is calibration in a betting model?

Accepted Answer

Calibration is whether a model's probabilities mean what they say. A calibrated model that rates a set of picks at 70% will see those picks win about 70% of the time. It is separate from accuracy: a model can pick the right side often (good accuracy) while systematically overstating how confident it should be (poor calibration). Staking math depends on the probability being honest, so calibration is what makes a number bettable.

Question 2

What is the calibration gap?

Accepted Answer

The calibration gap is the difference between the win rate a model predicted and the win rate it actually achieved, measured across a bucket of similar picks. If the model rated a group of picks at 72% on average and they won 55% of the time, the calibration gap is minus 17 points. A negative gap means overconfident; a positive gap means underconfident. Inside about plus or minus 2 points is well calibrated.

Question 3

Why does calibration matter more than win rate?

Accepted Answer

Win rate tells you how often you won; calibration tells you whether the probabilities you staked on were honest. Stake sizing (Kelly and every fractional version of it) takes the model's probability as an input. If a model says 70% but the real rate is 55%, every stake on those picks is too large and the bankroll bleeds even when the picks are nominally correct. A calibrated 58% is more profitable than an overconfident 70% because you size it right.

Question 4

How do you measure model calibration?

Accepted Answer

Bucket every resolved pick by its predicted probability (for example 50-55%, 55-60%, and so on), then compare the average predicted probability in each bucket to the actual hit rate. Plot predicted on one axis and actual on the other: a perfectly calibrated model lands on the diagonal. The Brier score summarizes the whole curve in one number. Buckets with fewer than about 15 picks are noise and should not drive decisions.

Question 5

Can a model be accurate but poorly calibrated?

Accepted Answer

Yes, and it is the most common failure mode. A model can rank teams correctly and pick winners at a respectable rate while exaggerating the margin, calling coin-flips 65% and clear favorites 90%. Accuracy looks fine; calibration is broken. The tell is positive closing-line value paired with a negative calibration gap: the model is finding real edges but overstating their size.

Question 6

How do you fix an overconfident model?

Accepted Answer

You shrink the probabilities toward 50% by the amount the data says they were overstated, rather than adding more features. Common methods are a per-sport bias offset, Platt scaling (a sigmoid refit on resolved results), and per-confidence-band shrink for tiers that consistently over-promised. When a market's measured gap is catastrophic, the honest move is to bench it entirely until the gap recovers, not to keep chasing it with bigger corrections.

Predicted bucket	Avg predicted	Actual hit rate	Calibration gap
50–55%	52.6%	48.3%	−4.3 pt
60–65%	63.0%	63.5%	+0.5 pt
70–75%	71.9%	45.5%	−26.4 pt

Why a 70% Pick Should Win 70% of the Time

Accuracy and calibration are not the same thing

Why calibration is the number you stake on

The calibration gap: one number, honestly reported

The reliability curve and the Brier score

How you fix a model that's overconfident

How we audit our own calibration