A Statistical Dispatch on Hot Streaks · Baseball, 2026
The Sports Page
Making the numbers mean something since the first pitch
Issue No. 51May 18, 2026Distributed Free to Friends & Family

Three Lies, Thirty Dots.

A companion piece to last week’s issue on cost-per-win. Same payroll. Same wins. Same thirty teams. Three different charts — each one technically honest, each one telling a story the data does not, in fact, support. Consider this a guided tour of how a sports columnist could lie to you with statistics without saying a single false sentence.
By The Professor · The Sports Page · In the Tradition of Darrell Huff
3
Lies, This Issue
.014
Honest R²
.076
Best Cherry-Picked R²

In 1954, the journalist Darrell Huff published a small book called How to Lie with Statistics. It is the best-selling statistics book in history, mostly because the title is a confession dressed as a tutorial. Huff’s thesis was that no one needs to invent fake numbers to deceive a reader; the deceptive parts of statistics are the choices made around the numbers — which axis to use, which observations to include, which curve to draw through the cloud of dots. Each of those choices is a lie of omission disguised as a methodological preference.

This issue takes the dataset from last week’s Cost of a Win piece — the 2026 payroll and current win column for all thirty MLB teams — and shows three different ways that same dataset could be drawn to support three completely different conclusions. The honest fit, computed without any cherry-picking, has an R² of .014: through May 7, payroll explains essentially none of the variation in wins. Each of the lies below makes that relationship look stronger or different than it actually is.

Lie #1 · The Quadratic Fit That Makes Up a Story

Lie 1   “Wins Follow a U-Shape: The Middle Market Is the Worst Place to Spend.”
26 22 18 14 $0 $100M $200M $300M $400M 2026 PAYROLL
Same thirty teams. Quadratic fit forced through the points. R² = .056 — four times the honest linear fit. The curve dips through the middle market, swoops up sharply at the top end, and tells you that “wins follow a U-shape.” The curve is mostly fitting one team (the Dodgers), but you would have to know that to know it.

The first lie is the easiest one to fall for, because it looks like sophistication. A linear fit on this data has an R² of .014. A second-degree polynomial — a parabola — has an R² of .056. That is four times as much explanatory power, achieved by adding one extra parameter. The columnist who runs this fit will write a sentence that begins, The relationship between payroll and wins is non-linear, with diminishing returns in the middle market and accelerating returns above $250 million. This sentence sounds correct. The chart appears to support it. The sentence is, nonetheless, almost entirely a lie.

The honest reading: the parabola is fitting the Dodgers, and to a lesser extent the Yankees. Of thirty teams, exactly two are in the “accelerating returns” portion of the curve, and those two teams are entirely responsible for the upward sweep at the right edge of the chart. The dip in the middle is not a real phenomenon either; it is what a parabola has to do, mathematically, in order to bend its way around two stubbornly high data points on the right edge while still passing through the cloud in the middle. An overfit polynomial does not describe a relationship; it describes the residuals of a small number of outlying points. A more flexible model — a fifth-degree polynomial, say — would have an even higher R², and would also be even more obviously absurd.

Lie #2 · The Outlier Deletion That Cleans Up the Story

Lie 2   “Once You Remove the Two Obvious Outliers, the Pattern Is Clear: More Money, More Wins.”
26 22 18 14 $0 $100M $200M $300M $400M 2026 PAYROLL (METS & BLUE JAYS REMOVED) × ×
Same thirty teams as before, except the Mets and Blue Jays have been removed (red ×). New linear fit: slope of +1.47 wins per $100M, R² = .076. Both numbers are more than double the values from the full dataset. Same data minus two teams; entirely different story.

The second lie is the most common one in sports writing, and the hardest to defend against, because it always begins with a sensible-sounding word: outlier. The columnist will say something like, If we remove the Mets and Blue Jays, who are clearly underperforming for reasons unrelated to payroll, the relationship between spending and wins becomes much stronger. The new R² jumps from .014 to .076. The slope jumps from +0.6 wins per $100 million to +1.47 wins per $100 million. The columnist points to these numbers and concludes that, in the typical case, money does buy wins, and the Mets and Blue Jays are the exceptions that prove the rule.

The Word “Outlier” Is Doing the Work Here

An outlier, in legitimate use, is a data point that arose from a different process than the rest of the dataset — a typo, a measurement error, a fundamentally different population. None of those describe the Mets or the Blue Jays. They are major-league teams playing major-league baseball, with payrolls and win totals generated by exactly the same processes as the other twenty-eight. They are not outliers. They are data points the columnist did not like.

The honest treatment of an inconvenient observation is to model it — to ask why a team with this payroll is producing this win total, and to refine your understanding of the relationship. The dishonest treatment is to remove it from the chart and quietly rerun the regression. The R² will, in almost every case, get bigger. That is not because your model is better. It is because you have less data.

Lie #3 · The Truncated Y-Axis That Makes the Spread Look Big

Lie 3   “Look at the Massive Spread in Win Totals Across the League.”
26 24 22 20 18 16 14 $0 $100M $200M $300M $400M 2026 PAYROLL
Same thirty teams. Same regression. The y-axis now starts at 14 wins instead of zero, which doubles the apparent vertical spread. The Mets land at the bottom of the visible chart, and the Yankees and Braves brush the ceiling. The data has not changed. The chart implies an enormous gap between teams that is, in absolute terms, exactly the same gap as before.

The third lie is the one Huff dedicated his second chapter to: cropping the y-axis. The chart above shows the same thirty dots and the same regression line, but the y-axis no longer starts at zero (or even at ten); it starts at fourteen. This is, in some sense, a defensible choice — nobody is at zero wins, so why include that part of the chart? The trouble is what happens visually. The Mets, who are at fourteen wins, now sit at the bottom edge of the picture frame. The Yankees and Braves and Cubs, at twenty-six, are at the top edge. The eye, looking at this chart, registers a much larger spread than the first chart in the original issue did. The columnist who uses this chart will write that there is a “massive gap” between the league’s best and worst teams. The actual gap is twelve wins out of thirty-seven games played — not nothing, but also not a number that would impress anyone if it were rendered on a y-axis that started at zero.

“Every chart is a stack of choices. The honest ones are the choices the writer announces. The dishonest ones are the choices the writer makes and forgets to mention.”

— The Professor, on the moral architecture of statistical graphics

A Defensive Reader’s Checklist

If, in the next month, you read a sports column that shows you a scatter plot purporting to demonstrate that money buys wins (or fails to buy them), here are the four things to check before you believe the chart:

Look ForWhat It Tells You
The R²If the columnist is showing you a chart and not stating the R², there is a reason. The R² is the percentage of variation the model explains. Below .10 is decorative, not predictive.
What got droppedAre all teams shown? If not, was the deletion justified by a measurement issue (yes) or by “they don’t fit the story” (no)?
The y-axis floorDoes the y-axis start at zero, or at some convenient number that magnifies the spread? The choice should be defensible on its merits, not because it makes the picture more dramatic.
The fit typeLinear is the default. Quadratic or higher-order fits should be justified by a real reason (not by “it gives me a higher R²”). Adding parameters always raises R².

None of the three charts in this issue are technically wrong. Each of them computes correctly from the same dataset. Each of them, in isolation, would pass the sniff test of any editor who is not paying close attention. And each of them, taken at face value, would lead a reader to a different conclusion about the relationship between baseball payroll and baseball wins. The honest answer — that the relationship is real but small, and that thirty-seven games is too few to see it cleanly — is, statistically, the dullest of the four. Dullness, here, is a virtue. The chart is doing exactly as much work as the data warrants. Anything more interesting is being added by hand.

Got a stat that doesn’t make sense?

Send it. We’ll find what the math is hiding — and we just might write the next issue about it.

Submit via GitHub → Or Email Patrick
The Sports Page
Share The Sports Page
Scan the code or share the link. Free, always.
QR Code
thesportspage.net
© 2026 The Sports Page · A Statistical Dispatch for Friends & Family
Licensed under The Sports Page License · Borrow it, give it back better · Non-commercial, attribution required
Not sure what counts as commercial? Take the 2-minute quiz →
Pass it on.
A few minutes to read. A few seconds to send.
Share on X Facebook LinkedIn Email
The Sports Page
Or scan, for sharing the old-fashioned way.
QR Code to thesportspage.net
thesportspage.net
© 2026 The Sports Page · A Statistical Dispatch for Friends & Family