CFB Analytics — Data & Model Evaluation

A polyglot college-football pipeline · CFBD → DuckDB → dbt → R + Python models

Published

July 2, 2026

Data: CollegeFootballData.com (CFBD API, free tier). Raw data is gitignored and collected within the API’s bounded/cached terms. This page evaluates (1) the data and (2) the models’ accuracy. Every statistical method is implemented in both R and Python against the same warehouse feed; their agreement is a built-in correctness check (see R ↔︎ Python parity).

🏈 Open the interactive Team Comparison → — pick any two FBS teams and compare their 2025 season head-to-head (SP+, EPA, efficiency, player leaders) with a neutral-field matchup projection and 2026 win forecast.

📖 Stat Guide → — a plain-English data dictionary for every metric, with the 2025 FBS range for each.  •  🧪 The Models → — how the projections work and how well they perform: an interactive win-probability curve, a real held-out calibration plot, baseline comparisons, and the R ↔︎ Python parity gate.

Part 1 — Evaluating the data

Data-quality panel

Medallion warehouse
dbt data tests: 99 passed, 0 failed · sources fresh as of 2026-07-01
Layer Tables Rows
staging 7 845,808
silver 5 863,147
gold 18 1,462,182
snapshots 1 218
Bronze sources — load lineage
Source Last loaded Rows
plays 2026-07-01 13:07:56 1,101,386
games 2026-07-01 13:08:00 11,366
lines 2026-07-01 13:08:05 9,166
ratings_sp 2026-07-01 13:08:07 406
teams 2026-07-01 13:08:10 403
recruiting 2026-07-01 13:08:18 79
calendar 2026-07-01 13:08:12 50

Team efficiency leaderboard

Opponent-context offensive/defensive EPA per play (lower defensive EPA = better), season average.

Top teams by net EPA/play — 2025
Team W% Net EPA Off EPA Def EPA
Ohio State 86% 0.434 0.186 −0.248
Texas Tech 86% 0.428 0.050 −0.378
Toledo 62% 0.399 0.129 −0.270
Notre Dame 83% 0.381 0.247 −0.134
Indiana 100% 0.326 0.200 −0.126
Washington 69% 0.307 0.187 −0.120
Vanderbilt 77% 0.294 0.306 0.012
Oregon 87% 0.280 0.181 −0.099
South Florida 69% 0.278 0.192 −0.085
Utah 85% 0.268 0.194 −0.074
Texas A&M 85% 0.266 0.125 −0.141
North Texas 86% 0.253 0.273 0.020
Net EPA = offensive EPA/play − defensive EPA/play allowed.

“Over expected” leaderboards (M2–M4)

The book’s residual framework: model the expected outcome from game situation; the residual is skill. RYOE (rushing yards over expected, M3) and CPOE (completion % over expected, M4). (Featuring 2024 — the latest season with complete player attribution in the source; 2025 play-by-play is fully loaded at the team level but names only ~56% of rushes so far.)

Rushing Yards Over Expected
2024 · ≥100 carries
Player Team Carries RYOE/att
Devon Dampier New Mexico 133 2.76
Garrett Greene West Virginia 109 2.58
Parker Navarro Ohio 138 2.56
Blake Horvath Navy 159 2.51
Fluff Bothwell South Alabama 108 2.42
Eli Sanders New Mexico 131 2.19
Jeremiyah Love Notre Dame 155 2.18
Brendon Lewis Nevada 127 2.08
Hajj-Malik Williams UNLV 120 2.03
Isaac Brown Louisville 156 2.02
Completion % Over Expected
2024 · ≥200 attempts
Player Team Attempts CPOE
Haynes King Georgia Tech 255 12.4%
Shedeur Sanders Colorado 436 11.8%
Jordan McCloud Texas State 271 10.3%
Dillon Gabriel Oregon 426 10.2%
Jacob Clark Missouri State 339 10.0%
Will Rogers Washington 307 9.6%
Will Howard Ohio State 362 8.7%
Hayden Wolff Western Michigan 276 8.4%
Mikey Keene Fresno State 325 7.1%
Dylan Raiola Nebraska 368 7.0%

Metric stability (M1) — is it skill or noise?

Split-half reliability (odd vs even plays) of rushing yards-per-carry. A tight diagonal = a reliable, skill-driven metric; a loose cloud = mostly noise (and a candidate for shrinkage, M7).

Reliability of rushing YPC
R and Python agree
Metric Season(s) n R Python
Split-half reliability (Spearman-Brown) 2023 212 0.547 0.547
Split-half reliability (Spearman-Brown) 2024 202 0.510 0.510
Split-half reliability (Spearman-Brown) 2025 54 0.454 0.454
Year-over-year r 2023-2024 160 0.218 0.218

Team archetypes (M6) — PCA + k-means

Each team-season’s 7-stat style profile reduced to 2 principal components, then clustered into five archetypes. Efficient, complete teams sit to one side; struggling or one-dimensional teams to the other.

Archetype centroids
Mean style profile per cluster (Def EPA lower = better)
Archetype n Off EPA Def EPA Run rate Pace
1 75 −0.011 0.014 0.58 57.42
2 82 0.165 −0.024 0.54 58.19
3 70 −0.011 0.115 0.49 53.78
4 76 0.171 0.153 0.50 58.52
5 105 0.082 0.067 0.49 63.17

Expected points by field position — the EPA foundation

Every efficiency metric here is built on EPA (expected points added). This is the expected- points surface it rests on: the mean expected points for a first-and-10 (and later downs) at each spot on the field — from ~6.4 points at the goal line down to negative when backed up near your own end zone. The monotonic, football-sensible shape is the sanity check for the whole EPA layer.

Part 2 — Evaluating the models

The game win-probability model

Logistic model of P(home win) from leakage-safe entering-form efficiency differentials. Trained on the earlier seasons (2023–24) and evaluated on the sealed 2025 holdout — a season it never saw — against a home-field-naive baseline and the betting market (the bar to clear).

2025 holdout accuracy
Lower Brier/log-loss = better; the model approaches the market using on-field efficiency only
Metric Model Market line Naive (home-field)
Brier score 0.191 0.176 0.243
Log loss 0.562
AUC 0.773
Accuracy 0.704 0.739

Forecasting a new season before it starts

The in-season model needs a few weeks of this year’s results. To predict openers — or a whole schedule the day it drops — a preseason priors model uses only last season’s carryover (SP+ rating, net EPA, win rate). It’s genuinely useful, but the table below quantifies the honest cost of having no current-year form: preseason forecasts are meaningfully less certain. This is the model that would score a 2026 schedule the moment it’s loaded.

In-season vs preseason forecasting (2025 holdout)
Preseason (prior-year priors only) beats a naive guess but trails the in-season model
Metric In-season model Preseason priors Naive baseline
Brier (lower better) 0.191 0.206 0.241
AUC (higher better) 0.773 0.720
Accuracy 0.704 0.677

Model coefficients & fit (M3–M5)

Headline fit metrics by model
Model Metric Value
ryoe_simple rmse 7.7152
ryoe_simple r_squared 0.0103
ryoe_multiple rmse 7.6845
ryoe_multiple r_squared 0.0181
baseline_mean rmse 7.7552
cpoe brier 0.2344
cpoe brier_baseline 0.2365
cpoe log_loss 0.6615
passing_td_poisson dispersion 0.9476
passing_td_poisson aic_poisson 14092.2662

Multilevel shrinkage (M7) — regression to the mean

M1 showed rushing efficiency is noisy; a random-intercept model (ryoe ~ 1 + (1 | rusher)) acts on it. Partial pooling pulls each rusher’s raw RYOE toward the league mean — hardest for small-sample rushers. Below, the raw averages (grey) fan out wildly at low volume while the shrunken estimates (blue) stay disciplined; only high-volume backs keep a large signal.

Only 1.2% of RYOE variance is between rushers (the intra-class correlation) — the rest is play-to-play noise. That tiny ICC is exactly why the pooling is so aggressive, and it quantitatively confirms the M1 stability finding.

Do teams beat their recruiting? (M8)

Recruiting rankings here were collected by ethically scraping Wikipedia (CC BY-SA, bot-friendly) — only the team-level ranking table, never individual (often minor) recruits — and were validated to match CFBD’s sanctioned recruiting API exactly. Regressing each team’s on-field production (SP+ rating) on its recruiting-class rank shows the expected relationship (better classes → better teams), but the residuals are the story: who out- or under-performed the talent they signed.

Biggest overachievers vs recruiting
Production well above what their signed talent predicted
Team Season Recruit rank SP+ Beat recruiting by
Michigan 2023 18 31.3 20.0
Ole Miss 2024 21 27.9 18.7
Ole Miss 2025 16 24.0 11.2
Penn State 2024 15 24.6 11.1
Washington 2025 23 18.4 10.6
Ohio State 2024 5 31.2 10.5

Part 3 — 2026 season outlook (a live forecast)

Everything above measures the past; this looks forward. Taking the real 2026 schedule (pulled from CFBD, none of it played yet) and each team’s 2025 carryover, the preseason priors model scores every game, and the per-game probabilities add up to a projected win total for each team. Read this as an informed prior, not a prophecy: it knows how teams finished 2025 and who they play in 2026, but not transfers, coaching changes, or breakout freshmen — and preseason forecasts are the least certain kind (the accuracy table above quantifies that).

Projected 2026 win totals

Expected wins summed over each team’s FBS-vs-FBS games (cupcake FCS games excluded, so most teams’ real totals run ~1 higher). It blends team quality and schedule: a strong team with a soft slate projects high; a good team in a brutal division less so.

2026 projected win totals — top 25
Preseason priors model applied to the real 2026 schedule
# Team FBS games Proj. wins Proj. losses Avg win prob
1 Notre Dame 12 9.4 2.6 78%
2 Texas Tech 11 9.1 1.9 83%
3 Penn State 12 8.9 3.1 75%
4 Ohio State 12 8.9 3.1 74%
5 Indiana 11 8.3 2.7 76%
6 Miami 11 7.8 3.2 71%
7 Oregon 11 7.7 3.3 70%
8 South Florida 11 7.6 3.4 70%
9 Utah 11 7.6 3.4 70%
10 James Madison 11 7.4 3.6 68%
11 Iowa 11 7.4 3.6 67%
12 USC 12 7.3 4.7 61%
13 SMU 11 7.2 3.8 66%
14 Oklahoma 12 7.1 4.9 59%
15 Toledo 10 7.1 2.9 71%
16 Ole Miss 11 7.0 4.0 64%
17 North Texas 11 7.0 4.0 64%
18 Memphis 11 7.0 4.0 63%
19 Georgia 11 6.9 4.1 63%
20 Clemson 11 6.8 4.2 62%
21 East Carolina 11 6.8 4.2 62%
22 Vanderbilt 11 6.7 4.3 61%
23 Washington 11 6.6 4.4 60%
24 Virginia 11 6.6 4.4 60%
25 Texas A&M 11 6.6 4.4 60%

Opening weekend (week 1)

The model’s closest calls — genuine toss-ups where it has little conviction — and the games it’s most sure about.

Week 1 — the model's true toss-ups
≈50% = a coin flip even to the model
Matchup Home win prob Model leans
Toledo @ Michigan State 51% Michigan State
San José State @ Eastern Michigan 52% Eastern Michigan
UNLV @ Hawai'i 53% Hawai'i
Western Kentucky @ Nevada 47% Western Kentucky
Memphis @ UNLV 46% Memphis
Hawai'i @ Stanford 46% Hawai'i
UCLA @ California 55% California
Oklahoma State @ Tulsa 55% Tulsa
Wyoming @ Colorado State 55% Colorado State
SMU @ Florida State 56% Florida State
Week 1 — the model's most confident picks
Matchup Favorite Win prob
Ball State @ Ohio State Ohio State 97%
Massachusetts @ Rutgers Rutgers 92%
Kent State @ South Carolina South Carolina 92%
Northern Illinois @ Iowa Iowa 91%
Missouri State @ Texas A&M Texas A&M 90%
UL Monroe @ Mississippi State Mississippi State 89%
UTEP @ Oklahoma Oklahoma 88%
San José State @ USC USC 87%

R ↔︎ Python parity — the built-in correctness check

Every method is fit independently in R and Python on the identical feed. Because the fits are mathematically the same (OLS, IRLS GLM, Poisson, logistic MLE), the coefficients must agree. The pipeline’s load_results step fails the build if any term diverges beyond tolerance.

R vs Python coefficient agreement
32 coefficients across 9 models · max |Δ| = 1.47e-01
Model Term R Python |Δ|
passing_td_negbin (Intercept) −2.6343 −2.7812 1.5 × 10−1
passing_td_negbin log(pass_attempts) 0.6920 0.7220 3.0 × 10−2
passing_td_negbin opponent_defense_rating 0.0254 0.0271 1.7 × 10−3
shrinkage (Intercept) −0.0468 −0.0468 3.6 × 10−6
game_winprob def_epa_diff −1.3682 −1.3682 1.1 × 10−7
game_winprob win_pct_diff 1.4092 1.4092 6.0 × 10−8
game_winprob roll3_net_epa_diff 0.7997 0.7997 4.3 × 10−8
game_winprob (Intercept) 0.3266 0.3266 2.7 × 10−8
game_winprob off_epa_diff 2.2239 2.2239 4.0 × 10−9
game_winprob sos_diff 0.0716 0.0716 1.8 × 10−9
passing_td_poisson (Intercept) −2.6343 −2.6343 1.4 × 10−10
passing_td_poisson log(pass_attempts) 0.6920 0.6920 3.6 × 10−11

The unsupervised and mixed-effects methods (M6, M7) don’t reduce to a single comparable coefficient — their fits are sign- or label-ambiguous — so agreement is measured label-invariantly:

R vs Python — M6/M7 agreement
1.0 = identical (PCA up to sign; clusters up to labelling)
Check Agreement
M6 · PC1 |correlation| 1.000
M6 · cluster adjusted Rand index 0.974
M7 · shrunken-estimate correlation 1.000

Built with R (gt, ggplot2), Python (duckdb, pandas), dbt, and DuckDB. Models from Eager & Erickson, Football Analytics with Python & R, applied to CFB. Data: CollegeFootballData.com.