National Forum

The Math Behind Monaghan Being #1

(Oldest Posts First)

95% of gaelic football results since 2010 can be explained by a mix of population, sport preferences, county financial resources, historical success, proximity to cities with significant career/study opportunities. A model created from these factors predicts win/loss results since 2010 with 95% accuracy when weighting games for relative importance. Worth noting that this year's championship has been only predictable to 70% accuracy mostly down to rule changes and Louth over-performing with ceiling still unknown :)

When ranking counties on how they've over-performed against this model since 2010, Monaghan is the top performing county in the country (i.e., wins more than it should), followed by Mayo, and then Kerry. Meath and Kildare have underperformed the most vs expectations which can be mostly attributed to the model not being perfectly fine-tuned for the negative impact that comes with proximity to Dublin (careers, commutes, study, other sports).

Here's a breakdown of the relative importance.

1. Code Selection & Religious Demographics
Relative Contribution: 45%
Context: This tier represents the primary filter on a county's gross population. The model applies a strict deduction based on competing athletic pathways. In the Republic, this tracks the dual-sport hurling drain (e.g., Cork, Galway, splitting talent down the middle) and urban soccer academies. In Northern Ireland, it applies a community background weight to reflect the available GAA player base. This explains why high-population counties like Kilkenny, Antrim, and Waterford sit near the bottom of the football standings. It shrinks their massive gross population down to their actual, football-playing talent pool.

2. Gravitational Commuter Drag
Relative Contribution: 22%
Context: This factor models the lifestyle fragmentation caused by commuting into major economic engines. This factor targets the "Leinster Commuter Trap." It accounts for the structural bottlenecks highlighted by the GAA National Demographics Committee-namely, that exploding suburban populations face a severe facility and time deficit. By penalizing counties within the immediate corporate orbits of Dublin and Cork, the model dramatically lowers the expected baselines for Meath and Kildare, shrinking their negative residuals. It probably doesn't do enough on this front but an r squared of 95% is pretty good.

3. Financial Preparation Capital
Relative Contribution: 14%
Context: Teams are amateur, but their preparation models are entirely corporate. This factor tracks annual team preparation expenditures relative to the national average. It recognizes that doubling an elite sports budget yields only a limited efficiency boost on the pitch (non-linear gains from investments into training/infrastructure). This prevents high-spending dual counties from breaking the model while properly elevating Dublin's baseline to account for its massive commercial advantage.

4. Student-Athlete Migration & Legacy Coaching Lines
Combined Relative Contribution: 14%
Context: These variables process population quality and institutional knowledge. This uses travel tracker metrics to penalize counties that must manage "exiled" college student training groups in Dublin or Belfast mid-week.

5. Unexplained Variance / Stochastic Error
Relative Contribution: 5.0%
Context: The remaining variance represents factors that cannot be captured by socio-economic data: generational talent anomalies (e.g., David Clifford emerging in a specific county), weather conditions, refereeing decisions, or short-term psychological momentum.

level (Louth) - Posts: 108 - 01/07/2026 22:31:30    2683548

Link

Replying To level:  "95% of gaelic football results since 2010 can be explained by a mix of population, sport preferences, county financial resources, historical success, proximity to cities with significant career/study opportunities. A model created from these factors predicts win/loss results since 2010 with 95% accuracy when weighting games for relative importance. Worth noting that this year's championship has been only predictable to 70% accuracy mostly down to rule changes and Louth over-performing with ceiling still unknown :)

When ranking counties on how they've over-performed against this model since 2010, Monaghan is the top performing county in the country (i.e., wins more than it should), followed by Mayo, and then Kerry. Meath and Kildare have underperformed the most vs expectations which can be mostly attributed to the model not being perfectly fine-tuned for the negative impact that comes with proximity to Dublin (careers, commutes, study, other sports).

Here's a breakdown of the relative importance.

1. Code Selection & Religious Demographics
Relative Contribution: 45%
Context: This tier represents the primary filter on a county's gross population. The model applies a strict deduction based on competing athletic pathways. In the Republic, this tracks the dual-sport hurling drain (e.g., Cork, Galway, splitting talent down the middle) and urban soccer academies. In Northern Ireland, it applies a community background weight to reflect the available GAA player base. This explains why high-population counties like Kilkenny, Antrim, and Waterford sit near the bottom of the football standings. It shrinks their massive gross population down to their actual, football-playing talent pool.

2. Gravitational Commuter Drag
Relative Contribution: 22%
Context: This factor models the lifestyle fragmentation caused by commuting into major economic engines. This factor targets the "Leinster Commuter Trap." It accounts for the structural bottlenecks highlighted by the GAA National Demographics Committee-namely, that exploding suburban populations face a severe facility and time deficit. By penalizing counties within the immediate corporate orbits of Dublin and Cork, the model dramatically lowers the expected baselines for Meath and Kildare, shrinking their negative residuals. It probably doesn't do enough on this front but an r squared of 95% is pretty good.

3. Financial Preparation Capital
Relative Contribution: 14%
Context: Teams are amateur, but their preparation models are entirely corporate. This factor tracks annual team preparation expenditures relative to the national average. It recognizes that doubling an elite sports budget yields only a limited efficiency boost on the pitch (non-linear gains from investments into training/infrastructure). This prevents high-spending dual counties from breaking the model while properly elevating Dublin's baseline to account for its massive commercial advantage.

4. Student-Athlete Migration & Legacy Coaching Lines
Combined Relative Contribution: 14%
Context: These variables process population quality and institutional knowledge. This uses travel tracker metrics to penalize counties that must manage "exiled" college student training groups in Dublin or Belfast mid-week.

5. Unexplained Variance / Stochastic Error
Relative Contribution: 5.0%
Context: The remaining variance represents factors that cannot be captured by socio-economic data: generational talent anomalies (e.g., David Clifford emerging in a specific county), weather conditions, refereeing decisions, or short-term psychological momentum."
95% of gaelic football results since 2010 can be explained by a mix of population, sport preferences, county financial resources, historical success, proximity to cities with significant career/study opportunities.

No they can't.

GreenandRed (Mayo) - Posts: 8652 - 02/07/2026 12:16:34    2683641

Link

Replying To level:  "95% of gaelic football results since 2010 can be explained by a mix of population, sport preferences, county financial resources, historical success, proximity to cities with significant career/study opportunities. A model created from these factors predicts win/loss results since 2010 with 95% accuracy when weighting games for relative importance. Worth noting that this year's championship has been only predictable to 70% accuracy mostly down to rule changes and Louth over-performing with ceiling still unknown :)

When ranking counties on how they've over-performed against this model since 2010, Monaghan is the top performing county in the country (i.e., wins more than it should), followed by Mayo, and then Kerry. Meath and Kildare have underperformed the most vs expectations which can be mostly attributed to the model not being perfectly fine-tuned for the negative impact that comes with proximity to Dublin (careers, commutes, study, other sports).

Here's a breakdown of the relative importance.

1. Code Selection & Religious Demographics
Relative Contribution: 45%
Context: This tier represents the primary filter on a county's gross population. The model applies a strict deduction based on competing athletic pathways. In the Republic, this tracks the dual-sport hurling drain (e.g., Cork, Galway, splitting talent down the middle) and urban soccer academies. In Northern Ireland, it applies a community background weight to reflect the available GAA player base. This explains why high-population counties like Kilkenny, Antrim, and Waterford sit near the bottom of the football standings. It shrinks their massive gross population down to their actual, football-playing talent pool.

2. Gravitational Commuter Drag
Relative Contribution: 22%
Context: This factor models the lifestyle fragmentation caused by commuting into major economic engines. This factor targets the "Leinster Commuter Trap." It accounts for the structural bottlenecks highlighted by the GAA National Demographics Committee-namely, that exploding suburban populations face a severe facility and time deficit. By penalizing counties within the immediate corporate orbits of Dublin and Cork, the model dramatically lowers the expected baselines for Meath and Kildare, shrinking their negative residuals. It probably doesn't do enough on this front but an r squared of 95% is pretty good.

3. Financial Preparation Capital
Relative Contribution: 14%
Context: Teams are amateur, but their preparation models are entirely corporate. This factor tracks annual team preparation expenditures relative to the national average. It recognizes that doubling an elite sports budget yields only a limited efficiency boost on the pitch (non-linear gains from investments into training/infrastructure). This prevents high-spending dual counties from breaking the model while properly elevating Dublin's baseline to account for its massive commercial advantage.

4. Student-Athlete Migration & Legacy Coaching Lines
Combined Relative Contribution: 14%
Context: These variables process population quality and institutional knowledge. This uses travel tracker metrics to penalize counties that must manage "exiled" college student training groups in Dublin or Belfast mid-week.

5. Unexplained Variance / Stochastic Error
Relative Contribution: 5.0%
Context: The remaining variance represents factors that cannot be captured by socio-economic data: generational talent anomalies (e.g., David Clifford emerging in a specific county), weather conditions, refereeing decisions, or short-term psychological momentum."
is there an actual model or is this just a chat gpt speal?

tirawleybaron (Mayo) - Posts: 1970 - 02/07/2026 12:40:52    2683650

Link

Replying To tirawleybaron:  "is there an actual model or is this just a chat gpt speal?"
Here you go... made a few tweaks but this will get you to 80% accuracy using only socio-economic data and football pedigree from more than 10 years prior to the observation window. The last 15% takes some real work. NOTE -- with this approach, Kerry is completely average given all their history prior to 10 years ago. Mayo and Monaghan are the big over-performers vs the model. So -- you can think of that being some reflection of manager skill, player skill/commitment, i.e., it's not just history and socio-economic factors that explain current performance. Meath gets hammered because it's close to Dublin, the 90s success is recent, and I'm a Louthman.

The model (with all the terms explained)

PPIᵢ = β₀ + β₁·P_advancedᵢ + β₂·Incomeᵢ + β₃·AirportTimeᵢ + β₄·Pedigreeᵢ + eᵢ, with Σe = 0 and OIᵢ = eᵢ/σₑ.


P_advanced - Effective Structural Population

The heart of the model. Each county's gross population is converted into an "effective playing base" by a chain of discounts:

Population × C_community × C_hurling
P_advanced = ─────────────────────────────────────────────────────────
[log₁₀(Density)]^1.0 (urban drain)
× [1 + 0.48·(120 − t_hub)/120] (city-proximity drain)
× [1 + 0.70·max(0, t_CoE − 30)/60] (training-commute tax)


Density term - denser counties lose share to alternative sports infrastructure (soccer academies etc.); Dublin's 1,582/km² divides its base by 3.2, Leitrim's 22/km² by 1.34.
Hub term - proximity to Dublin/Belfast/Cork/Galway/Limerick penalizes (talent and attention drain to cities); a county 120+ minutes out pays nothing.
Commute tax - real drive time from the county's second population centre to its training base; every minute beyond 30 erodes the base. Cork loses 21% of its effective population to this term, Down 18%, Donegal 14%.
C_community - participation footprint applied to the six NI counties only: the Catholic share of "religion or religion brought up in" (Census 2021). Down retains 32.3% of its base, Tyrone 66.5%, ROI counties 100%.
C_hurling = 1 − α·(hurling intensity) - per-capita hurling All-Ireland semi-final appearances 2000-2025, normalized to the max county (Kilkenny). At the calibrated α = 1, Kilkenny's football base is fully written off; Tipperary retains ~62%.


Exogenous regressors

Income - disposable income per person, 2023, harmonised all-island (PPS). Enters negatively (β₂ = −0.00094): richer counties systematically underperform their structural base, consistent with opportunity-cost pressures on an amateur sport.
Airport access - OSRM drive time to the nearest of eight international airports; a broad connectivity/remoteness control (β₃ ≈ 0 bc it's largely absorbed by the pedigree term so it's not important).


Pedigree - decayed tradition, with the last 10 years excluded

NOTE -- you lose about 5-10% accuracy if you push the exclusion window out to 20 years (i.e., by putting it to 10, you capture some current players who have known recent-ish success)

To capture footballing tradition without letting current squads predict themselves:

Pedigree(county, season t) = Σ over All-Ireland finals in years y ≤ t−10 of
weight(final) × 0.5^((t − y − 10)/5)


Only finals at least 10 years before the measured season count (recent-squad-bias filter).
Beyond that boundary, value halves every 5 years - recent tradition matters far more than ancient tradition.
A won final is worth 1.0; a lost final only 0.25.


Data sources (all fetched 2026-07-02)

InputSourceNotesNFL final standings, 2010-2025Wikipedia season pages (_National_Football_League_(Ireland))All four divisions; 2023 D4 table via finalwhistle.ie (Wikipedia render truncated), points cross-checkedChampionship match logs, 2010-2025Wikipedia AI-SFC season pages (_All-Ireland_Senior_Football_Championship)AI series + provincial finals; 2011-2017 bracket details confirmed via searches grounded in RTÉ/GAA/Irish Examiner match reportsAll-Ireland finals, 1887-2025List of All-Ireland SFC finalsPedigree input; 138 finals, replays counted onceHurling semi-finalists, 2000-2025Championship record (compiled)Drives C_hurlingCounty populationsList of Irish counties by populationCensus 2022 (ROI) / Census 2021 (NI)County areas & densitiesList of Irish counties by areaNI community backgroundReligion in Northern Ireland (NISRA county-level Census 2021 table)"Religion or religion brought up in", Catholic shareDisposable income (PPS 2023)CSO County Incomes & GDP 2024 + CSO/ONS all-island comparisonROI: 8 published anchors, rest derived from the deviation index (≤0.9% off anchors). NI: banded between the three published anchors - approximation, flaggedAll driving times (hub, commute, airport)OSRM public router, live table/route callsPrincipal-town and second-town gazetteer coordinates; snap distances <105 m; no time-of-day traffic (internally consistent)

Every scraped value is preserved as a flat CSV (obviously, I have that but easy to recreate with just the above)

The weights, and how each was arrived at

These matters for how much to trust each number bc there's some amount of over-fitting:

(a) Specified (fixed by the project brief, never tuned): all PPI tier weights; the 120-minute hub ceiling; the 30-minute commute baseline; the density floor.

(b) Decided (explicit modelling choices made during the build): 10-year pedigree lookback (chosen over 20 - the 20-year variant fits worse and was reported); 5-year half-life (chosen for maximal decay; note the leave-one-out sweep favoured 20-30, so this setting trades some out-of-sample robustness for a stronger recency gradient within tradition); exclusion of 2026; county-level spatial granularity.

(c) Calibrated (grid-searched - 2,304 configurations - with leave-one-out R², not raw R², as the selection criterion, to resist overfitting on 32 observations):

ConstantOriginal specv4 calibratedDensity exponent0.721.00Hub coefficient0.480.48 (survived)Commute-tax coefficient0.350.70Hurling penalty α0.501.00Pedigree runner-up weight0.500.25log-transform of P_advanced-rejected

The configuration that maximised raw R² (0.829) was rejected because it cross-validated worse (LOOCV 0.713 vs 0.743). The regression coefficients themselves (β₀…β₄) are ordinary least squares - nothing hand-set.

Fitted coefficients: β₀ = 19.71, β₁ = 1.03×10⁻⁴ per effective person, β₂ = −0.00094 per PPS, β₃ = +0.0009 per minute (≈0), β₄ = 7.90 per pedigree unit.

Diagnostics: R² = 0.805 · adjusted R² = 0.777 · F(4,27) = 27.9, p = 3×10⁻⁹ · LOOCV R² = 0.743 · Σresiduals = 0 exactly · OI mean 0, sd 1.

The v4 Overperformance leaderboard (2010-2025)

(don't know if this will render as a nice clean table or not; if you copy/past into excel or an ai client, it'll render it)

RankCountyActual PPIPredicted PPIOI1Mayo23.2512.47+2.812Monaghan14.064.52+2.483Donegal19.2314.44+1.254Roscommon11.096.52+1.195Galway18.4514.46+1.046Armagh12.399.82+0.677Derry11.7310.44+0.338Carlow3.151.96+0.319Dublin36.8135.63+0.3110Louth6.265.34+0.2411Cavan7.126.22+0.2312Tyrone20.1919.92+0.0713Westmeath4.624.47+0.0414Clare5.535.39+0.0415Down8.268.17+0.0216Tipperary4.234.23+0.0017Kildare8.028.74−0.1918Waterford2.082.86−0.2019Kerry30.6731.58−0.2420Kilkenny0.001.02−0.2721Limerick3.194.32−0.2922Leitrim3.154.43−0.3323Fermanagh4.816.31−0.3924Laois5.577.57−0.5225Sligo3.525.92−0.6226Longford3.245.74−0.6527Offaly3.286.05−0.7228Wicklow3.036.20−0.8329Cork14.5617.90−0.8730Wexford3.828.33−1.1731Antrim3.419.60−1.6132Meath9.1317.24−2.1

level (Louth) - Posts: 108 - 02/07/2026 22:27:38    2683792

Link

Replying To level:  "95% of gaelic football results since 2010 can be explained by a mix of population, sport preferences, county financial resources, historical success, proximity to cities with significant career/study opportunities. A model created from these factors predicts win/loss results since 2010 with 95% accuracy when weighting games for relative importance. Worth noting that this year's championship has been only predictable to 70% accuracy mostly down to rule changes and Louth over-performing with ceiling still unknown :)

When ranking counties on how they've over-performed against this model since 2010, Monaghan is the top performing county in the country (i.e., wins more than it should), followed by Mayo, and then Kerry. Meath and Kildare have underperformed the most vs expectations which can be mostly attributed to the model not being perfectly fine-tuned for the negative impact that comes with proximity to Dublin (careers, commutes, study, other sports).

Here's a breakdown of the relative importance.

1. Code Selection & Religious Demographics
Relative Contribution: 45%
Context: This tier represents the primary filter on a county's gross population. The model applies a strict deduction based on competing athletic pathways. In the Republic, this tracks the dual-sport hurling drain (e.g., Cork, Galway, splitting talent down the middle) and urban soccer academies. In Northern Ireland, it applies a community background weight to reflect the available GAA player base. This explains why high-population counties like Kilkenny, Antrim, and Waterford sit near the bottom of the football standings. It shrinks their massive gross population down to their actual, football-playing talent pool.

2. Gravitational Commuter Drag
Relative Contribution: 22%
Context: This factor models the lifestyle fragmentation caused by commuting into major economic engines. This factor targets the "Leinster Commuter Trap." It accounts for the structural bottlenecks highlighted by the GAA National Demographics Committee-namely, that exploding suburban populations face a severe facility and time deficit. By penalizing counties within the immediate corporate orbits of Dublin and Cork, the model dramatically lowers the expected baselines for Meath and Kildare, shrinking their negative residuals. It probably doesn't do enough on this front but an r squared of 95% is pretty good.

3. Financial Preparation Capital
Relative Contribution: 14%
Context: Teams are amateur, but their preparation models are entirely corporate. This factor tracks annual team preparation expenditures relative to the national average. It recognizes that doubling an elite sports budget yields only a limited efficiency boost on the pitch (non-linear gains from investments into training/infrastructure). This prevents high-spending dual counties from breaking the model while properly elevating Dublin's baseline to account for its massive commercial advantage.

4. Student-Athlete Migration & Legacy Coaching Lines
Combined Relative Contribution: 14%
Context: These variables process population quality and institutional knowledge. This uses travel tracker metrics to penalize counties that must manage "exiled" college student training groups in Dublin or Belfast mid-week.

5. Unexplained Variance / Stochastic Error
Relative Contribution: 5.0%
Context: The remaining variance represents factors that cannot be captured by socio-economic data: generational talent anomalies (e.g., David Clifford emerging in a specific county), weather conditions, refereeing decisions, or short-term psychological momentum."
"This explains why high-population counties like Kilkenny, Antrim, and Waterford..."
Kilkenny is not a high-population county.

Cockney_Cat (UK) - Posts: 2923 - 03/07/2026 00:31:01    2683808

Link

Definitely a post from Omahant USA top drawer.

Saynothing (Tyrone) - Posts: 2828 - 03/07/2026 10:09:47    2683838

Link

Replying To level:  "Here you go... made a few tweaks but this will get you to 80% accuracy using only socio-economic data and football pedigree from more than 10 years prior to the observation window. The last 15% takes some real work. NOTE -- with this approach, Kerry is completely average given all their history prior to 10 years ago. Mayo and Monaghan are the big over-performers vs the model. So -- you can think of that being some reflection of manager skill, player skill/commitment, i.e., it's not just history and socio-economic factors that explain current performance. Meath gets hammered because it's close to Dublin, the 90s success is recent, and I'm a Louthman.

The model (with all the terms explained)

PPIᵢ = β₀ + β₁·P_advancedᵢ + β₂·Incomeᵢ + β₃·AirportTimeᵢ + β₄·Pedigreeᵢ + eᵢ, with Σe = 0 and OIᵢ = eᵢ/σₑ.


P_advanced - Effective Structural Population

The heart of the model. Each county's gross population is converted into an "effective playing base" by a chain of discounts:

Population × C_community × C_hurling
P_advanced = ─────────────────────────────────────────────────────────
[log₁₀(Density)
^1.0 (urban drain)
× [1 + 0.48·(120 − t_hub)/120] (city-proximity drain)
× [1 + 0.70·max(0, t_CoE − 30)/60] (training-commute tax)


Density term - denser counties lose share to alternative sports infrastructure (soccer academies etc.); Dublin's 1,582/km² divides its base by 3.2, Leitrim's 22/km² by 1.34.
Hub term - proximity to Dublin/Belfast/Cork/Galway/Limerick penalizes (talent and attention drain to cities); a county 120+ minutes out pays nothing.
Commute tax - real drive time from the county's second population centre to its training base; every minute beyond 30 erodes the base. Cork loses 21% of its effective population to this term, Down 18%, Donegal 14%.
C_community - participation footprint applied to the six NI counties only: the Catholic share of "religion or religion brought up in" (Census 2021). Down retains 32.3% of its base, Tyrone 66.5%, ROI counties 100%.
C_hurling = 1 − α·(hurling intensity) - per-capita hurling All-Ireland semi-final appearances 2000-2025, normalized to the max county (Kilkenny). At the calibrated α = 1, Kilkenny's football base is fully written off; Tipperary retains ~62%.


Exogenous regressors

Income - disposable income per person, 2023, harmonised all-island (PPS). Enters negatively (β₂ = −0.00094): richer counties systematically underperform their structural base, consistent with opportunity-cost pressures on an amateur sport.
Airport access - OSRM drive time to the nearest of eight international airports; a broad connectivity/remoteness control (β₃ ≈ 0 bc it's largely absorbed by the pedigree term so it's not important).


Pedigree - decayed tradition, with the last 10 years excluded

NOTE -- you lose about 5-10% accuracy if you push the exclusion window out to 20 years (i.e., by putting it to 10, you capture some current players who have known recent-ish success)

To capture footballing tradition without letting current squads predict themselves:

Pedigree(county, season t) = Σ over All-Ireland finals in years y ≤ t−10 of
weight(final) × 0.5^((t − y − 10)/5)


Only finals at least 10 years before the measured season count (recent-squad-bias filter).
Beyond that boundary, value halves every 5 years - recent tradition matters far more than ancient tradition.
A won final is worth 1.0; a lost final only 0.25.


Data sources (all fetched 2026-07-02)

InputSourceNotesNFL final standings, 2010-2025Wikipedia season pages (_National_Football_League_(Ireland))All four divisions; 2023 D4 table via finalwhistle.ie (Wikipedia render truncated), points cross-checkedChampionship match logs, 2010-2025Wikipedia AI-SFC season pages (_All-Ireland_Senior_Football_Championship)AI series + provincial finals; 2011-2017 bracket details confirmed via searches grounded in RTÉ/GAA/Irish Examiner match reportsAll-Ireland finals, 1887-2025List of All-Ireland SFC finalsPedigree input; 138 finals, replays counted onceHurling semi-finalists, 2000-2025Championship record (compiled)Drives C_hurlingCounty populationsList of Irish counties by populationCensus 2022 (ROI) / Census 2021 (NI)County areas & densitiesList of Irish counties by areaNI community backgroundReligion in Northern Ireland (NISRA county-level Census 2021 table)"Religion or religion brought up in", Catholic shareDisposable income (PPS 2023)CSO County Incomes & GDP 2024 + CSO/ONS all-island comparisonROI: 8 published anchors, rest derived from the deviation index (≤0.9% off anchors). NI: banded between the three published anchors - approximation, flaggedAll driving times (hub, commute, airport)OSRM public router, live table/route callsPrincipal-town and second-town gazetteer coordinates; snap distances <105 m; no time-of-day traffic (internally consistent)

Every scraped value is preserved as a flat CSV (obviously, I have that but easy to recreate with just the above)

The weights, and how each was arrived at

These matters for how much to trust each number bc there's some amount of over-fitting:

(a) Specified (fixed by the project brief, never tuned): all PPI tier weights; the 120-minute hub ceiling; the 30-minute commute baseline; the density floor.

(b) Decided (explicit modelling choices made during the build): 10-year pedigree lookback (chosen over 20 - the 20-year variant fits worse and was reported); 5-year half-life (chosen for maximal decay; note the leave-one-out sweep favoured 20-30, so this setting trades some out-of-sample robustness for a stronger recency gradient within tradition); exclusion of 2026; county-level spatial granularity.

(c) Calibrated (grid-searched - 2,304 configurations - with leave-one-out R², not raw R², as the selection criterion, to resist overfitting on 32 observations):

ConstantOriginal specv4 calibratedDensity exponent0.721.00Hub coefficient0.480.48 (survived)Commute-tax coefficient0.350.70Hurling penalty α0.501.00Pedigree runner-up weight0.500.25log-transform of P_advanced-rejected

The configuration that maximised raw R² (0.829) was rejected because it cross-validated worse (LOOCV 0.713 vs 0.743). The regression coefficients themselves (β₀…β₄) are ordinary least squares - nothing hand-set.

Fitted coefficients: β₀ = 19.71, β₁ = 1.03×10⁻⁴ per effective person, β₂ = −0.00094 per PPS, β₃ = +0.0009 per minute (≈0), β₄ = 7.90 per pedigree unit.

Diagnostics: R² = 0.805 · adjusted R² = 0.777 · F(4,27) = 27.9, p = 3×10⁻⁹ · LOOCV R² = 0.743 · Σresiduals = 0 exactly · OI mean 0, sd 1.

The v4 Overperformance leaderboard (2010-2025)

(don't know if this will render as a nice clean table or not; if you copy/past into excel or an ai client, it'll render it)

RankCountyActual PPIPredicted PPIOI1Mayo23.2512.47+2.812Monaghan14.064.52+2.483Donegal19.2314.44+1.254Roscommon11.096.52+1.195Galway18.4514.46+1.046Armagh12.399.82+0.677Derry11.7310.44+0.338Carlow3.151.96+0.319Dublin36.8135.63+0.3110Louth6.265.34+0.2411Cavan7.126.22+0.2312Tyrone20.1919.92+0.0713Westmeath4.624.47+0.0414Clare5.535.39+0.0415Down8.268.17+0.0216Tipperary4.234.23+0.0017Kildare8.028.74−0.1918Waterford2.082.86−0.2019Kerry30.6731.58−0.2420Kilkenny0.001.02−0.2721Limerick3.194.32−0.2922Leitrim3.154.43−0.3323Fermanagh4.816.31−0.3924Laois5.577.57−0.5225Sligo3.525.92−0.6226Longford3.245.74−0.6527Offaly3.286.05−0.7228Wicklow3.036.20−0.8329Cork14.5617.90−0.8730Wexford3.828.33−1.1731Antrim3.419.60−1.6132Meath9.1317.24−2.1"]You took the words out of my mouth.

avonali (Dublin) - Posts: 2061 - 03/07/2026 10:53:39    2683857

Link

Replying To Saynothing:  "Definitely a post from Omahant USA top drawer."
An essay on a format change is generally his trait

Gaa_lover (USA) - Posts: 4025 - 03/07/2026 12:44:49    2683905

Link