Why Variance Matters, and Why Composite Indices Beat Single Stats
The single most useful word in statistics
Suppose you and a friend each measure the height of a flagpole and report your answers. Yours: thirty-two feet. Theirs: forty-one feet. The mean of your two answers is 36.5, and that mean is, on average, more accurate than either individual measurement. The reason is variance. Each of you, working alone, was subject to measurement error — a slightly off reading on the tape measure, a hand that wobbled. By averaging two independent measurements, you cancel out half the error.
Variance is the formal name for that error. In statistics, it is the average squared distance of a set of numbers from their mean. A high-variance measurement is one that bounces around a lot, day to day or trial to trial; a low-variance measurement is one that lands close to its true value most of the time. Here is the foundational fact: when you average several measurements, the variance of the average is smaller than the variance of any single measurement. Specifically, for n independent measurements with the same variance, the variance of the mean is divided by n. Two measurements halve the error. Four quarter it. Sixteen reduce it by sixteen.
This is the whole reason composite indices — in sports, in economics, in clinical trials — exist. The Consumer Price Index averages thousands of prices because any single price is noisy. The S&P 500 averages five hundred companies because any single company is noisy. Bill James’s Win Shares averages offense, defense, and pitching because any single component is noisy. The same statistical move, applied to different problems, with the same justification: the average is more reliable than its parts.
The question, then, is which parts to combine and how to weight them. The honest answer is that there is no single right combination — but as long as each input is a reasonable measurement of the underlying thing you care about, averaging them reduces error. That is the only mathematical guarantee. It happens to be a very strong one.
The variance of the mean is variance over n
The formal definition of variance, for a random variable X with mean μ:
If you have n independent random variables X₁, X₂, …, Xₙ, each with the same variance σ², the variance of their average is:
This single equation is the engine of almost everything in applied statistics. It is why polls report margins of error inversely proportional to the square root of sample size, why scientific experiments use replicate measurements, and why averaging three reasonable owner metrics is more reliable than fixating on one. The variance of a three-metric average is one-third the variance of a single metric (assuming the three are independent — in practice they are correlated, which softens but does not eliminate the gain).
Z-scoring, the move used in the main piece, is a related discipline. It standardizes each metric by subtracting the league mean and dividing by the league standard deviation:
The result is a number that says, in standard-deviation units, where this owner sits relative to their peers. A z-score of +1 is one standard deviation above the league mean; +2 is two. Z-scores are dimensionless, so a z-score on championships per decade (an NBA-scale number around 1-3) and a z-score on win percentage (a fraction between 0 and 1) can be averaged together honestly. Without z-scoring, the championship metric would dominate the average because its raw values are larger.
Worked example: Jerry Buss vs. James Dolan, in z-scores
For the NBA owners in our dataset, the within-league means and standard deviations on each metric are approximately:
· Champs/decade: μ = 1.0, σ = 1.1
· Win percentage: μ = 0.510, σ = 0.080
· Playoff%: μ = 0.59, σ = 0.21
Buss: champs/decade = 3.03, win% = .622, playoff% = .88
z ≈ (3.03 − 1.0)/1.1, (0.622 − 0.510)/0.080, (0.88 − 0.59)/0.21
z ≈ +1.85, +1.40, +1.38
Composite = (1.85 + 1.40 + 1.38) / 3 ≈ +1.54
Dolan: champs/decade = 0, win% = .430, playoff% = .35
z ≈ (0 − 1.0)/1.1, (0.430 − 0.510)/0.080, (0.35 − 0.59)/0.21
z ≈ −0.91, −1.00, −1.14
Composite = (−0.91 + −1.00 + −1.14) / 3 ≈ −1.02
VERDICT · Buss is roughly 2.5 standard deviations above Dolan in NBA owner space. A single-metric ranking on championships would say Buss-by-ten and Dolan-by-zero, which understates the gap because zero is the floor on champs but Dolan’s underperformance shows up across all three components.
The cross-league rankings in the main piece use this same procedure, run within each league. Cross-league composites are then directly comparable because z-scores share a common scale — one standard deviation in the NBA is the same statistical distance from the mean as one standard deviation in the NFL. The MLB and NHL z-scores are computed analogously.
The Owner Index, in thirty lines of Python
Below is a self-contained implementation. The input is a list of owners with their three raw metrics; the output is a sorted ranking with composite z-scores. The math is exactly what was described above; the code makes it explicit.
# owner_index.py — z-scored composite of three owner metrics from statistics import mean, stdev from collections import defaultdict def z_score(value, values): mu = mean(values) sigma = stdev(values) return (value - mu) / sigma if sigma > 0 else 0 def owner_index(owners): # owners is a list of dicts with keys: # league, champs_per_decade, win_pct, playoff_pct by_league = defaultdict(list) for o in owners: by_league[o["league"]].append(o) for league, group in by_league.items(): cpd = [o["champs_per_decade"] for o in group] wpc = [o["win_pct"] for o in group] ppc = [o["playoff_pct"] for o in group] for o in group: z1 = z_score(o["champs_per_decade"], cpd) z2 = z_score(o["win_pct"], wpc) z3 = z_score(o["playoff_pct"], ppc) o["composite"] = (z1 + z2 + z3) / 3 return sorted(owners, key=lambda o: -o["composite"]) if __name__ == "__main__": # 21-row spreadsheet truncated for illustration data = [ {"name": "Jerry Buss", "league": "NBA", "champs_per_decade": 3.03, "win_pct": 0.622, "playoff_pct": 0.88}, {"name": "James Dolan", "league": "NBA", "champs_per_decade": 0.0, "win_pct": 0.430, "playoff_pct": 0.35}, # ... 19 more owners, real data in spreadsheet ] ranked = owner_index(data) for o in ranked: print(f"{o['name']:25s} {o['league']:5s} composite = {o['composite']:+.2f}")
Three notes on what the code does not do. First, it does not weight the three metrics — they are averaged equally. A more sophisticated index would weight by inverse variance or by predictive validity. Second, it treats the three metrics as independent, which they are not (good owners tend to be good at all three). The variance reduction from averaging correlated inputs is smaller than the equation in the math section suggests — but it is still positive. Third, it z-scores within league but not within era. An owner from the 1980s NBA faces a different competitive landscape than one from the 2020s NBA. A version that respected era would z-score within league-decade.
Each of those refinements would shuffle the rankings somewhat. None of them would change the basic finding: composites are more stable than single stats, and Bill James’s methodological move — in 1985, in 2002, and now — remains the right one.
If your favorite ranking is built on a single number, ask yourself how that number compares to two or three reasonable alternatives. If they all agree, the ranking is robust. If they disagree, the headline number is doing too much work.