CFB Analytics — Data & Model Evaluation

A polyglot college-football pipeline · CFBD → DuckDB → dbt → R + Python models

Published

July 2, 2026

Data: CollegeFootballData.com (CFBD API, free tier). Raw data is gitignored and collected within the API’s bounded/cached terms. This page evaluates (1) the data and (2) the models’ accuracy. Every statistical method is implemented in both R and Python against the same warehouse feed; their agreement is a built-in correctness check (see R ↔︎ Python parity).

🏈 Open the interactive Team Comparison → — pick any two FBS teams and compare their 2025 season head-to-head (SP+, EPA, efficiency, player leaders) with a neutral-field matchup projection and 2026 win forecast.

📖 Stat Guide → — a plain-English data dictionary for every metric, with the 2025 FBS range for each. • 🧪 The Models → — how the projections work and how well they perform: an interactive win-probability curve, a real held-out calibration plot, baseline comparisons, and the R ↔︎ Python parity gate.

Part 1 — Evaluating the data

Data-quality panel

Layer	Tables	Rows
Medallion warehouse
dbt data tests: 99 passed, 0 failed · sources fresh as of 2026-07-01
staging	7	845,808
silver	5	863,147
gold	18	1,462,182
snapshots	1	218

Source	Last loaded	Rows
Bronze sources — load lineage
plays	2026-07-01 13:07:56	1,101,386
games	2026-07-01 13:08:00	11,366
lines	2026-07-01 13:08:05	9,166
ratings_sp	2026-07-01 13:08:07	406
teams	2026-07-01 13:08:10	403
recruiting	2026-07-01 13:08:18	79
calendar	2026-07-01 13:08:12	50

Team efficiency leaderboard

Opponent-context offensive/defensive EPA per play (lower defensive EPA = better), season average.

Team	W%	Net EPA	Off EPA	Def EPA
Top teams by net EPA/play — 2025
Ohio State	86%	0.434	0.186	−0.248
Texas Tech	86%	0.428	0.050	−0.378
Toledo	62%	0.399	0.129	−0.270
Notre Dame	83%	0.381	0.247	−0.134
Indiana	100%	0.326	0.200	−0.126
Washington	69%	0.307	0.187	−0.120
Vanderbilt	77%	0.294	0.306	0.012
Oregon	87%	0.280	0.181	−0.099
South Florida	69%	0.278	0.192	−0.085
Utah	85%	0.268	0.194	−0.074
Texas A&M	85%	0.266	0.125	−0.141
North Texas	86%	0.253	0.273	0.020
Net EPA = offensive EPA/play − defensive EPA/play allowed.

“Over expected” leaderboards (M2–M4)

The book’s residual framework: model the expected outcome from game situation; the residual is skill. RYOE (rushing yards over expected, M3) and CPOE (completion % over expected, M4). (Featuring 2024 — the latest season with complete player attribution in the source; 2025 play-by-play is fully loaded at the team level but names only ~56% of rushes so far.)

Player	Team	Carries	RYOE/att
Rushing Yards Over Expected
2024 · ≥100 carries
Devon Dampier	New Mexico	133	2.76
Garrett Greene	West Virginia	109	2.58
Parker Navarro	Ohio	138	2.56
Blake Horvath	Navy	159	2.51
Fluff Bothwell	South Alabama	108	2.42
Eli Sanders	New Mexico	131	2.19
Jeremiyah Love	Notre Dame	155	2.18
Brendon Lewis	Nevada	127	2.08
Hajj-Malik Williams	UNLV	120	2.03
Isaac Brown	Louisville	156	2.02

Player	Team	Attempts	CPOE
Completion % Over Expected
2024 · ≥200 attempts
Haynes King	Georgia Tech	255	12.4%
Shedeur Sanders	Colorado	436	11.8%
Jordan McCloud	Texas State	271	10.3%
Dillon Gabriel	Oregon	426	10.2%
Jacob Clark	Missouri State	339	10.0%
Will Rogers	Washington	307	9.6%
Will Howard	Ohio State	362	8.7%
Hayden Wolff	Western Michigan	276	8.4%
Mikey Keene	Fresno State	325	7.1%
Dylan Raiola	Nebraska	368	7.0%

Metric stability (M1) — is it skill or noise?

Split-half reliability (odd vs even plays) of rushing yards-per-carry. A tight diagonal = a reliable, skill-driven metric; a loose cloud = mostly noise (and a candidate for shrinkage, M7).

Metric	Season(s)	n	R	Python
Reliability of rushing YPC
R and Python agree
Split-half reliability (Spearman-Brown)	2023	212	0.547	0.547
Split-half reliability (Spearman-Brown)	2024	202	0.510	0.510
Split-half reliability (Spearman-Brown)	2025	54	0.454	0.454
Year-over-year r	2023-2024	160	0.218	0.218

Team archetypes (M6) — PCA + k-means

Each team-season’s 7-stat style profile reduced to 2 principal components, then clustered into five archetypes. Efficient, complete teams sit to one side; struggling or one-dimensional teams to the other.

Archetype	n	Off EPA	Def EPA	Run rate	Pace
Archetype centroids
Mean style profile per cluster (Def EPA lower = better)
1	75	−0.011	0.014	0.58	57.42
2	82	0.165	−0.024	0.54	58.19
3	70	−0.011	0.115	0.49	53.78
4	76	0.171	0.153	0.50	58.52
5	105	0.082	0.067	0.49	63.17

Expected points by field position — the EPA foundation

Every efficiency metric here is built on EPA (expected points added). This is the expected- points surface it rests on: the mean expected points for a first-and-10 (and later downs) at each spot on the field — from ~6.4 points at the goal line down to negative when backed up near your own end zone. The monotonic, football-sensible shape is the sanity check for the whole EPA layer.

Part 2 — Evaluating the models

The game win-probability model

Logistic model of P(home win) from leakage-safe entering-form efficiency differentials. Trained on the earlier seasons (2023–24) and evaluated on the sealed 2025 holdout — a season it never saw — against a home-field-naive baseline and the betting market (the bar to clear).

Metric	Model	Market line	Naive (home-field)
2025 holdout accuracy
Lower Brier/log-loss = better; the model approaches the market using on-field efficiency only
Brier score	0.191	0.176	0.243
Log loss	0.562	—	—
AUC	0.773	—	—
Accuracy	0.704	0.739	—

Forecasting a new season before it starts

The in-season model needs a few weeks of this year’s results. To predict openers — or a whole schedule the day it drops — a preseason priors model uses only last season’s carryover (SP+ rating, net EPA, win rate). It’s genuinely useful, but the table below quantifies the honest cost of having no current-year form: preseason forecasts are meaningfully less certain. This is the model that would score a 2026 schedule the moment it’s loaded.

Metric	In-season model	Preseason priors	Naive baseline
In-season vs preseason forecasting (2025 holdout)
Preseason (prior-year priors only) beats a naive guess but trails the in-season model
Brier (lower better)	0.191	0.206	0.241
AUC (higher better)	0.773	0.720	—
Accuracy	0.704	0.677	—

Model coefficients & fit (M3–M5)

Model	Metric	Value
Headline fit metrics by model
ryoe_simple	rmse	7.7152
ryoe_simple	r_squared	0.0103
ryoe_multiple	rmse	7.6845
ryoe_multiple	r_squared	0.0181
baseline_mean	rmse	7.7552
cpoe	brier	0.2344
cpoe	brier_baseline	0.2365
cpoe	log_loss	0.6615
passing_td_poisson	dispersion	0.9476
passing_td_poisson	aic_poisson	14092.2662

Multilevel shrinkage (M7) — regression to the mean

M1 showed rushing efficiency is noisy; a random-intercept model (ryoe ~ 1 + (1 | rusher)) acts on it. Partial pooling pulls each rusher’s raw RYOE toward the league mean — hardest for small-sample rushers. Below, the raw averages (grey) fan out wildly at low volume while the shrunken estimates (blue) stay disciplined; only high-volume backs keep a large signal.

Only 1.2% of RYOE variance is between rushers (the intra-class correlation) — the rest is play-to-play noise. That tiny ICC is exactly why the pooling is so aggressive, and it quantitatively confirms the M1 stability finding.

Do teams beat their recruiting? (M8)

Recruiting rankings here were collected by ethically scraping Wikipedia (CC BY-SA, bot-friendly) — only the team-level ranking table, never individual (often minor) recruits — and were validated to match CFBD’s sanctioned recruiting API exactly. Regressing each team’s on-field production (SP+ rating) on its recruiting-class rank shows the expected relationship (better classes → better teams), but the residuals are the story: who out- or under-performed the talent they signed.

Team	Season	Recruit rank	SP+	Beat recruiting by
Biggest overachievers vs recruiting
Production well above what their signed talent predicted
Michigan	2023	18	31.3	20.0
Ole Miss	2024	21	27.9	18.7
Ole Miss	2025	16	24.0	11.2
Penn State	2024	15	24.6	11.1
Washington	2025	23	18.4	10.6
Ohio State	2024	5	31.2	10.5

Part 3 — 2026 season outlook (a live forecast)

Everything above measures the past; this looks forward. Taking the real 2026 schedule (pulled from CFBD, none of it played yet) and each team’s 2025 carryover, the preseason priors model scores every game, and the per-game probabilities add up to a projected win total for each team. Read this as an informed prior, not a prophecy: it knows how teams finished 2025 and who they play in 2026, but not transfers, coaching changes, or breakout freshmen — and preseason forecasts are the least certain kind (the accuracy table above quantifies that).

Projected 2026 win totals

Expected wins summed over each team’s FBS-vs-FBS games (cupcake FCS games excluded, so most teams’ real totals run ~1 higher). It blends team quality and schedule: a strong team with a soft slate projects high; a good team in a brutal division less so.

#	Team	FBS games	Proj. wins	Proj. losses	Avg win prob
2026 projected win totals — top 25
Preseason priors model applied to the real 2026 schedule
1	Notre Dame	12	9.4	2.6	78%
2	Texas Tech	11	9.1	1.9	83%
3	Penn State	12	8.9	3.1	75%
4	Ohio State	12	8.9	3.1	74%
5	Indiana	11	8.3	2.7	76%
6	Miami	11	7.8	3.2	71%
7	Oregon	11	7.7	3.3	70%
8	South Florida	11	7.6	3.4	70%
9	Utah	11	7.6	3.4	70%
10	James Madison	11	7.4	3.6	68%
11	Iowa	11	7.4	3.6	67%
12	USC	12	7.3	4.7	61%
13	SMU	11	7.2	3.8	66%
14	Oklahoma	12	7.1	4.9	59%
15	Toledo	10	7.1	2.9	71%
16	Ole Miss	11	7.0	4.0	64%
17	North Texas	11	7.0	4.0	64%
18	Memphis	11	7.0	4.0	63%
19	Georgia	11	6.9	4.1	63%
20	Clemson	11	6.8	4.2	62%
21	East Carolina	11	6.8	4.2	62%
22	Vanderbilt	11	6.7	4.3	61%
23	Washington	11	6.6	4.4	60%
24	Virginia	11	6.6	4.4	60%
25	Texas A&M	11	6.6	4.4	60%

Opening weekend (week 1)

The model’s closest calls — genuine toss-ups where it has little conviction — and the games it’s most sure about.

Matchup	Home win prob	Model leans
Week 1 — the model's true toss-ups
≈50% = a coin flip even to the model
Toledo @ Michigan State	51%	Michigan State
San José State @ Eastern Michigan	52%	Eastern Michigan
UNLV @ Hawai'i	53%	Hawai'i
Western Kentucky @ Nevada	47%	Western Kentucky
Memphis @ UNLV	46%	Memphis
Hawai'i @ Stanford	46%	Hawai'i
UCLA @ California	55%	California
Oklahoma State @ Tulsa	55%	Tulsa
Wyoming @ Colorado State	55%	Colorado State
SMU @ Florida State	56%	Florida State

Matchup	Favorite	Win prob
Week 1 — the model's most confident picks
Ball State @ Ohio State	Ohio State	97%
Massachusetts @ Rutgers	Rutgers	92%
Kent State @ South Carolina	South Carolina	92%
Northern Illinois @ Iowa	Iowa	91%
Missouri State @ Texas A&M	Texas A&M	90%
UL Monroe @ Mississippi State	Mississippi State	89%
UTEP @ Oklahoma	Oklahoma	88%
San José State @ USC	USC	87%

R ↔︎ Python parity — the built-in correctness check

Every method is fit independently in R and Python on the identical feed. Because the fits are mathematically the same (OLS, IRLS GLM, Poisson, logistic MLE), the coefficients must agree. The pipeline’s load_results step fails the build if any term diverges beyond tolerance.

Model	Term	R	Python	\|Δ\|
R vs Python coefficient agreement
32 coefficients across 9 models · max \|Δ\| = 1.47e-01
passing_td_negbin	(Intercept)	−2.6343	−2.7812	1.5 × 10⁻¹
passing_td_negbin	log(pass_attempts)	0.6920	0.7220	3.0 × 10⁻²
passing_td_negbin	opponent_defense_rating	0.0254	0.0271	1.7 × 10⁻³
shrinkage	(Intercept)	−0.0468	−0.0468	3.6 × 10⁻⁶
game_winprob	def_epa_diff	−1.3682	−1.3682	1.1 × 10⁻⁷
game_winprob	win_pct_diff	1.4092	1.4092	6.0 × 10⁻⁸
game_winprob	roll3_net_epa_diff	0.7997	0.7997	4.3 × 10⁻⁸
game_winprob	(Intercept)	0.3266	0.3266	2.7 × 10⁻⁸
game_winprob	off_epa_diff	2.2239	2.2239	4.0 × 10⁻⁹
game_winprob	sos_diff	0.0716	0.0716	1.8 × 10⁻⁹
passing_td_poisson	(Intercept)	−2.6343	−2.6343	1.4 × 10⁻¹⁰
passing_td_poisson	log(pass_attempts)	0.6920	0.6920	3.6 × 10⁻¹¹

The unsupervised and mixed-effects methods (M6, M7) don’t reduce to a single comparable coefficient — their fits are sign- or label-ambiguous — so agreement is measured label-invariantly:

Check	Agreement
R vs Python — M6/M7 agreement
1.0 = identical (PCA up to sign; clusters up to labelling)
M6 · PC1 \|correlation\|	1.000
M6 · cluster adjusted Rand index	0.974
M7 · shrunken-estimate correlation	1.000

Built with R (gt, ggplot2), Python (duckdb, pandas), dbt, and DuckDB. Models from Eager & Erickson, Football Analytics with Python & R, applied to CFB. Data: CollegeFootballData.com.