| Medallion warehouse | ||
| dbt data tests: 99 passed, 0 failed · sources fresh as of 2026-07-01 | ||
| Layer | Tables | Rows |
|---|---|---|
| staging | 7 | 845,808 |
| silver | 5 | 863,147 |
| gold | 18 | 1,462,182 |
| snapshots | 1 | 218 |
CFB Analytics — Data & Model Evaluation
A polyglot college-football pipeline · CFBD → DuckDB → dbt → R + Python models
Data: CollegeFootballData.com (CFBD API, free tier). Raw data is gitignored and collected within the API’s bounded/cached terms. This page evaluates (1) the data and (2) the models’ accuracy. Every statistical method is implemented in both R and Python against the same warehouse feed; their agreement is a built-in correctness check (see R ↔︎ Python parity).
🏈 Open the interactive Team Comparison → — pick any two FBS teams and compare their 2025 season head-to-head (SP+, EPA, efficiency, player leaders) with a neutral-field matchup projection and 2026 win forecast.
📖 Stat Guide → — a plain-English data dictionary for every metric, with the 2025 FBS range for each. • 🧪 The Models → — how the projections work and how well they perform: an interactive win-probability curve, a real held-out calibration plot, baseline comparisons, and the R ↔︎ Python parity gate.
Part 1 — Evaluating the data
Data-quality panel
| Bronze sources — load lineage | ||
| Source | Last loaded | Rows |
|---|---|---|
| plays | 2026-07-01 13:07:56 | 1,101,386 |
| games | 2026-07-01 13:08:00 | 11,366 |
| lines | 2026-07-01 13:08:05 | 9,166 |
| ratings_sp | 2026-07-01 13:08:07 | 406 |
| teams | 2026-07-01 13:08:10 | 403 |
| recruiting | 2026-07-01 13:08:18 | 79 |
| calendar | 2026-07-01 13:08:12 | 50 |
Team efficiency leaderboard
Opponent-context offensive/defensive EPA per play (lower defensive EPA = better), season average.
| Top teams by net EPA/play — 2025 | ||||
| Team | W% | Net EPA | Off EPA | Def EPA |
|---|---|---|---|---|
| Ohio State | 86% | 0.434 | 0.186 | −0.248 |
| Texas Tech | 86% | 0.428 | 0.050 | −0.378 |
| Toledo | 62% | 0.399 | 0.129 | −0.270 |
| Notre Dame | 83% | 0.381 | 0.247 | −0.134 |
| Indiana | 100% | 0.326 | 0.200 | −0.126 |
| Washington | 69% | 0.307 | 0.187 | −0.120 |
| Vanderbilt | 77% | 0.294 | 0.306 | 0.012 |
| Oregon | 87% | 0.280 | 0.181 | −0.099 |
| South Florida | 69% | 0.278 | 0.192 | −0.085 |
| Utah | 85% | 0.268 | 0.194 | −0.074 |
| Texas A&M | 85% | 0.266 | 0.125 | −0.141 |
| North Texas | 86% | 0.253 | 0.273 | 0.020 |
| Net EPA = offensive EPA/play − defensive EPA/play allowed. | ||||
“Over expected” leaderboards (M2–M4)
The book’s residual framework: model the expected outcome from game situation; the residual is skill. RYOE (rushing yards over expected, M3) and CPOE (completion % over expected, M4). (Featuring 2024 — the latest season with complete player attribution in the source; 2025 play-by-play is fully loaded at the team level but names only ~56% of rushes so far.)
| Rushing Yards Over Expected | |||
| 2024 · ≥100 carries | |||
| Player | Team | Carries | RYOE/att |
|---|---|---|---|
| Devon Dampier | New Mexico | 133 | 2.76 |
| Garrett Greene | West Virginia | 109 | 2.58 |
| Parker Navarro | Ohio | 138 | 2.56 |
| Blake Horvath | Navy | 159 | 2.51 |
| Fluff Bothwell | South Alabama | 108 | 2.42 |
| Eli Sanders | New Mexico | 131 | 2.19 |
| Jeremiyah Love | Notre Dame | 155 | 2.18 |
| Brendon Lewis | Nevada | 127 | 2.08 |
| Hajj-Malik Williams | UNLV | 120 | 2.03 |
| Isaac Brown | Louisville | 156 | 2.02 |
| Completion % Over Expected | |||
| 2024 · ≥200 attempts | |||
| Player | Team | Attempts | CPOE |
|---|---|---|---|
| Haynes King | Georgia Tech | 255 | 12.4% |
| Shedeur Sanders | Colorado | 436 | 11.8% |
| Jordan McCloud | Texas State | 271 | 10.3% |
| Dillon Gabriel | Oregon | 426 | 10.2% |
| Jacob Clark | Missouri State | 339 | 10.0% |
| Will Rogers | Washington | 307 | 9.6% |
| Will Howard | Ohio State | 362 | 8.7% |
| Hayden Wolff | Western Michigan | 276 | 8.4% |
| Mikey Keene | Fresno State | 325 | 7.1% |
| Dylan Raiola | Nebraska | 368 | 7.0% |
Metric stability (M1) — is it skill or noise?
Split-half reliability (odd vs even plays) of rushing yards-per-carry. A tight diagonal = a reliable, skill-driven metric; a loose cloud = mostly noise (and a candidate for shrinkage, M7).
| Reliability of rushing YPC | ||||
| R and Python agree | ||||
| Metric | Season(s) | n | R | Python |
|---|---|---|---|---|
| Split-half reliability (Spearman-Brown) | 2023 | 212 | 0.547 | 0.547 |
| Split-half reliability (Spearman-Brown) | 2024 | 202 | 0.510 | 0.510 |
| Split-half reliability (Spearman-Brown) | 2025 | 54 | 0.454 | 0.454 |
| Year-over-year r | 2023-2024 | 160 | 0.218 | 0.218 |
Team archetypes (M6) — PCA + k-means
Each team-season’s 7-stat style profile reduced to 2 principal components, then clustered into five archetypes. Efficient, complete teams sit to one side; struggling or one-dimensional teams to the other.
| Archetype centroids | |||||
| Mean style profile per cluster (Def EPA lower = better) | |||||
| Archetype | n | Off EPA | Def EPA | Run rate | Pace |
|---|---|---|---|---|---|
| 1 | 75 | −0.011 | 0.014 | 0.58 | 57.42 |
| 2 | 82 | 0.165 | −0.024 | 0.54 | 58.19 |
| 3 | 70 | −0.011 | 0.115 | 0.49 | 53.78 |
| 4 | 76 | 0.171 | 0.153 | 0.50 | 58.52 |
| 5 | 105 | 0.082 | 0.067 | 0.49 | 63.17 |
Expected points by field position — the EPA foundation
Every efficiency metric here is built on EPA (expected points added). This is the expected- points surface it rests on: the mean expected points for a first-and-10 (and later downs) at each spot on the field — from ~6.4 points at the goal line down to negative when backed up near your own end zone. The monotonic, football-sensible shape is the sanity check for the whole EPA layer.
Part 2 — Evaluating the models
The game win-probability model
Logistic model of P(home win) from leakage-safe entering-form efficiency differentials. Trained on the earlier seasons (2023–24) and evaluated on the sealed 2025 holdout — a season it never saw — against a home-field-naive baseline and the betting market (the bar to clear).
| 2025 holdout accuracy | |||
| Lower Brier/log-loss = better; the model approaches the market using on-field efficiency only | |||
| Metric | Model | Market line | Naive (home-field) |
|---|---|---|---|
| Brier score | 0.191 | 0.176 | 0.243 |
| Log loss | 0.562 | — | — |
| AUC | 0.773 | — | — |
| Accuracy | 0.704 | 0.739 | — |
Forecasting a new season before it starts
The in-season model needs a few weeks of this year’s results. To predict openers — or a whole schedule the day it drops — a preseason priors model uses only last season’s carryover (SP+ rating, net EPA, win rate). It’s genuinely useful, but the table below quantifies the honest cost of having no current-year form: preseason forecasts are meaningfully less certain. This is the model that would score a 2026 schedule the moment it’s loaded.
| In-season vs preseason forecasting (2025 holdout) | |||
| Preseason (prior-year priors only) beats a naive guess but trails the in-season model | |||
| Metric | In-season model | Preseason priors | Naive baseline |
|---|---|---|---|
| Brier (lower better) | 0.191 | 0.206 | 0.241 |
| AUC (higher better) | 0.773 | 0.720 | — |
| Accuracy | 0.704 | 0.677 | — |
Model coefficients & fit (M3–M5)
| Headline fit metrics by model | ||
| Model | Metric | Value |
|---|---|---|
| ryoe_simple | rmse | 7.7152 |
| ryoe_simple | r_squared | 0.0103 |
| ryoe_multiple | rmse | 7.6845 |
| ryoe_multiple | r_squared | 0.0181 |
| baseline_mean | rmse | 7.7552 |
| cpoe | brier | 0.2344 |
| cpoe | brier_baseline | 0.2365 |
| cpoe | log_loss | 0.6615 |
| passing_td_poisson | dispersion | 0.9476 |
| passing_td_poisson | aic_poisson | 14092.2662 |
Multilevel shrinkage (M7) — regression to the mean
M1 showed rushing efficiency is noisy; a random-intercept model (ryoe ~ 1 + (1 | rusher)) acts on it. Partial pooling pulls each rusher’s raw RYOE toward the league mean — hardest for small-sample rushers. Below, the raw averages (grey) fan out wildly at low volume while the shrunken estimates (blue) stay disciplined; only high-volume backs keep a large signal.
Only 1.2% of RYOE variance is between rushers (the intra-class correlation) — the rest is play-to-play noise. That tiny ICC is exactly why the pooling is so aggressive, and it quantitatively confirms the M1 stability finding.
Do teams beat their recruiting? (M8)
Recruiting rankings here were collected by ethically scraping Wikipedia (CC BY-SA, bot-friendly) — only the team-level ranking table, never individual (often minor) recruits — and were validated to match CFBD’s sanctioned recruiting API exactly. Regressing each team’s on-field production (SP+ rating) on its recruiting-class rank shows the expected relationship (better classes → better teams), but the residuals are the story: who out- or under-performed the talent they signed.
| Biggest overachievers vs recruiting | ||||
| Production well above what their signed talent predicted | ||||
| Team | Season | Recruit rank | SP+ | Beat recruiting by |
|---|---|---|---|---|
| Michigan | 2023 | 18 | 31.3 | 20.0 |
| Ole Miss | 2024 | 21 | 27.9 | 18.7 |
| Ole Miss | 2025 | 16 | 24.0 | 11.2 |
| Penn State | 2024 | 15 | 24.6 | 11.1 |
| Washington | 2025 | 23 | 18.4 | 10.6 |
| Ohio State | 2024 | 5 | 31.2 | 10.5 |
Part 3 — 2026 season outlook (a live forecast)
Everything above measures the past; this looks forward. Taking the real 2026 schedule (pulled from CFBD, none of it played yet) and each team’s 2025 carryover, the preseason priors model scores every game, and the per-game probabilities add up to a projected win total for each team. Read this as an informed prior, not a prophecy: it knows how teams finished 2025 and who they play in 2026, but not transfers, coaching changes, or breakout freshmen — and preseason forecasts are the least certain kind (the accuracy table above quantifies that).
Projected 2026 win totals
Expected wins summed over each team’s FBS-vs-FBS games (cupcake FCS games excluded, so most teams’ real totals run ~1 higher). It blends team quality and schedule: a strong team with a soft slate projects high; a good team in a brutal division less so.
| 2026 projected win totals — top 25 | |||||
| Preseason priors model applied to the real 2026 schedule | |||||
| # | Team | FBS games | Proj. wins | Proj. losses | Avg win prob |
|---|---|---|---|---|---|
| 1 | Notre Dame | 12 | 9.4 | 2.6 | 78% |
| 2 | Texas Tech | 11 | 9.1 | 1.9 | 83% |
| 3 | Penn State | 12 | 8.9 | 3.1 | 75% |
| 4 | Ohio State | 12 | 8.9 | 3.1 | 74% |
| 5 | Indiana | 11 | 8.3 | 2.7 | 76% |
| 6 | Miami | 11 | 7.8 | 3.2 | 71% |
| 7 | Oregon | 11 | 7.7 | 3.3 | 70% |
| 8 | South Florida | 11 | 7.6 | 3.4 | 70% |
| 9 | Utah | 11 | 7.6 | 3.4 | 70% |
| 10 | James Madison | 11 | 7.4 | 3.6 | 68% |
| 11 | Iowa | 11 | 7.4 | 3.6 | 67% |
| 12 | USC | 12 | 7.3 | 4.7 | 61% |
| 13 | SMU | 11 | 7.2 | 3.8 | 66% |
| 14 | Oklahoma | 12 | 7.1 | 4.9 | 59% |
| 15 | Toledo | 10 | 7.1 | 2.9 | 71% |
| 16 | Ole Miss | 11 | 7.0 | 4.0 | 64% |
| 17 | North Texas | 11 | 7.0 | 4.0 | 64% |
| 18 | Memphis | 11 | 7.0 | 4.0 | 63% |
| 19 | Georgia | 11 | 6.9 | 4.1 | 63% |
| 20 | Clemson | 11 | 6.8 | 4.2 | 62% |
| 21 | East Carolina | 11 | 6.8 | 4.2 | 62% |
| 22 | Vanderbilt | 11 | 6.7 | 4.3 | 61% |
| 23 | Washington | 11 | 6.6 | 4.4 | 60% |
| 24 | Virginia | 11 | 6.6 | 4.4 | 60% |
| 25 | Texas A&M | 11 | 6.6 | 4.4 | 60% |
Opening weekend (week 1)
The model’s closest calls — genuine toss-ups where it has little conviction — and the games it’s most sure about.
| Week 1 — the model's true toss-ups | ||
| ≈50% = a coin flip even to the model | ||
| Matchup | Home win prob | Model leans |
|---|---|---|
| Toledo @ Michigan State | 51% | Michigan State |
| San José State @ Eastern Michigan | 52% | Eastern Michigan |
| UNLV @ Hawai'i | 53% | Hawai'i |
| Western Kentucky @ Nevada | 47% | Western Kentucky |
| Memphis @ UNLV | 46% | Memphis |
| Hawai'i @ Stanford | 46% | Hawai'i |
| UCLA @ California | 55% | California |
| Oklahoma State @ Tulsa | 55% | Tulsa |
| Wyoming @ Colorado State | 55% | Colorado State |
| SMU @ Florida State | 56% | Florida State |
| Week 1 — the model's most confident picks | ||
| Matchup | Favorite | Win prob |
|---|---|---|
| Ball State @ Ohio State | Ohio State | 97% |
| Massachusetts @ Rutgers | Rutgers | 92% |
| Kent State @ South Carolina | South Carolina | 92% |
| Northern Illinois @ Iowa | Iowa | 91% |
| Missouri State @ Texas A&M | Texas A&M | 90% |
| UL Monroe @ Mississippi State | Mississippi State | 89% |
| UTEP @ Oklahoma | Oklahoma | 88% |
| San José State @ USC | USC | 87% |
R ↔︎ Python parity — the built-in correctness check
Every method is fit independently in R and Python on the identical feed. Because the fits are mathematically the same (OLS, IRLS GLM, Poisson, logistic MLE), the coefficients must agree. The pipeline’s load_results step fails the build if any term diverges beyond tolerance.
| R vs Python coefficient agreement | ||||
| 32 coefficients across 9 models · max |Δ| = 1.47e-01 | ||||
| Model | Term | R | Python | |Δ| |
|---|---|---|---|---|
| passing_td_negbin | (Intercept) | −2.6343 | −2.7812 | 1.5 × 10−1 |
| passing_td_negbin | log(pass_attempts) | 0.6920 | 0.7220 | 3.0 × 10−2 |
| passing_td_negbin | opponent_defense_rating | 0.0254 | 0.0271 | 1.7 × 10−3 |
| shrinkage | (Intercept) | −0.0468 | −0.0468 | 3.6 × 10−6 |
| game_winprob | def_epa_diff | −1.3682 | −1.3682 | 1.1 × 10−7 |
| game_winprob | win_pct_diff | 1.4092 | 1.4092 | 6.0 × 10−8 |
| game_winprob | roll3_net_epa_diff | 0.7997 | 0.7997 | 4.3 × 10−8 |
| game_winprob | (Intercept) | 0.3266 | 0.3266 | 2.7 × 10−8 |
| game_winprob | off_epa_diff | 2.2239 | 2.2239 | 4.0 × 10−9 |
| game_winprob | sos_diff | 0.0716 | 0.0716 | 1.8 × 10−9 |
| passing_td_poisson | (Intercept) | −2.6343 | −2.6343 | 1.4 × 10−10 |
| passing_td_poisson | log(pass_attempts) | 0.6920 | 0.6920 | 3.6 × 10−11 |
The unsupervised and mixed-effects methods (M6, M7) don’t reduce to a single comparable coefficient — their fits are sign- or label-ambiguous — so agreement is measured label-invariantly:
| R vs Python — M6/M7 agreement | |
| 1.0 = identical (PCA up to sign; clusters up to labelling) | |
| Check | Agreement |
|---|---|
| M6 · PC1 |correlation| | 1.000 |
| M6 · cluster adjusted Rand index | 0.974 |
| M7 · shrunken-estimate correlation | 1.000 |
Built with R (gt, ggplot2), Python (duckdb, pandas), dbt, and DuckDB. Models from Eager & Erickson, Football Analytics with Python & R, applied to CFB. Data: CollegeFootballData.com.