Finding 0 — What we got wrong, and how
Honest log of the methodology errors in the first published version of this site (commit 424c76c, 20 May 2026). All claims listed here have been retracted, corrected, or reframed in the current version.
Why this page exists
We published the first version of this analysis with a fundamental methodology flaw and several confidently-asserted but incorrect findings. We pushed it live on Cloudflare Pages before stress-testing a single number against external sources. This page documents what we got wrong, what we replaced it with, and what we still don't know.
The single biggest error — "swing" was mislabelled level
The first version computed swing as pct_2026 − pct_2021 with NULLs filled to zero. That treats "didn't contest" as "got 0%" — which produces ridiculous numbers when a party only fielded candidates in one of the two years.
Concrete consequences
| Party | Old "median swing" | Reality |
|---|---|---|
| TVK | +33.7 pp | TVK didn't exist in 2021. The number is just the median 2026 vote share, not a swing. |
| BJP | +13.5 pp | BJP's 45 swing-rows = 25 brand-new entries + 12 withdrawals + only 8 actual same-AC comparisons. Mixing levels with changes. True swing (where comparable): −8.8 pp (BJP lost). |
| DMK | −15.3 pp | Roughly right by accident (DMK contested most ACs both years). Corrected: −15.0 pp. |
Retracted claim — "BJP gained 2× more in reserved seats"
The old reservation.md page asserted +28.5 pp BJP swing in SC/ST reserved seats vs +12.5 pp in General. This was the worst single claim on the site.
What was actually in the 7 "reserved BJP" rows
| AC | Constituency | 2021 BJP? | 2026 BJP? | "Swing" |
|---|---|---|---|---|
| 92 | Rasipuram (SC) | No (AIADMK seat) | 29.5% | +29.5 |
| 112 | Avanashi (SC) | No (AIADMK seat) | 29.7% | +29.7 |
| 178 | Gandarvakkottai (SC) | No (AIADMK seat) | 28.5% | +28.5 |
| 187 | Manamadurai (SC) | No (AIADMK seat) | 20.4% | +20.4 |
| 220 | Vasudevanallur (SC) | No (AIADMK seat) | 29.8% | +29.8 |
| 101 | Dharapuram (SC) | 45.7% (L. Murugan) | No (gave to AIADMK) | −45.7 |
| 151 | Tittakudi (SC) | 37.0% | No | −37.0 |
Zero of those 7 reserved seats had BJP contesting in both years. The +28.5 pp median was the average of 5 new entries + 2 dropouts — it measured NDA seat-allocation reshuffling, not voter behaviour.
What we missed: the actual Dalit-swing story is TVK winning 23 of 44 SC-reserved seats (52%), more than DMK + AIADMK combined. We credited BJP for what was TVK's breakthrough. See the corrected reservation page.
Reframed claim — "TVK +33.7 pp swing"
The headline survives as a number but the label was wrong. It's the median TVK vote share in 2026 (the only year they contested), not a swing.
The number itself was directionally right — TVK did win 108 seats with median ~34% share — but framing it as a swing implies year-over-year change, which is meaningless when no 2021 baseline exists.
Reframed claim — "DMK and AIADMK equally damaged"
The old framing said both Dravidian majors lost about the same. Magnitudes are close but the framing flattens what analysts say is asymmetric damage. Per Yashwant Deshmukh: "DMK fell more than TVK rose" — DMK was rejected by its core (anti-incumbency); AIADMK kept its OBC base but lost on the periphery. Different damage patterns. The old framing missed this.
Reframed claim — "TVK concentrated in Chennai metro"
Half-true. TVK's highest gains (+50-59 pp 2026 levels) were in Chennai + Thiruvallur suburbs. But TVK also won 33-50% in every major TN city (Coimbatore, Salem, Erode, Trichy, Madurai). The real axis: urban-rural statewide, with Cauvery delta + Dharmapuri + Thanjavur as rural holdouts. The old framing too narrowly Chennai-centred. See updated geography page.
Data-quality bug — 2 SC seats mislabelled as ST
The kracekumar 2021 CSV labels Yercaud (AC 83) and Senthamangalam (AC 93) as ST. They are SC-reserved per the 2008 Delimitation Order. TN has 44 SC + 2 ST = 46 reserved; our 2021 source data has 42 SC + 4 ST = 46, which sums correctly but mis-categorises 2 seats.
Fixed via RESERVATION_OVERRIDES in pipelines/path_a_build.py. The 2 ST seats per the Order are different from what the CSV says; we treat both as SC pending a canonical cross-check.
Methodology bias — religion correlation is structurally weak
We use district-level Census 2011 religion data attached to 234 ACs. 32 distinct religion vectors spread across 203 ACs ≈ 6 ACs per identical religion vector. Within-cluster swing variation is invisible — Pearson r is mechanically attenuated.
The "all |r| < 0.2" finding is real but partly artefactual. External analysts agreed on the conclusion (Deshmukh, ISAS, Wikipedia), so we kept the finding but added an explicit caveat. Genuine sub-district religion data (SHRUG) would tighten the test.
Findings that completely missed the story
What every analyst leads with, and we didn't surface at all in v1:
| Topic | What's true | Status |
|---|---|---|
| Voter turnout | 85.1% — record high for TN | mentioned in pages, not yet a dedicated page |
| Stalin losing Kolathur | Sitting CM defeated in his own seat by TVK newcomer | Now on Kolathur page |
| Demographic story | Youth + women shifted decisively; every analyst leads with this | Still missing — we don't have age/gender data |
| Welfare politics paradox | DMK welfare beneficiaries credited govt but still swung | Mentioned in retrospect; not modelled |
| Post-caste TVK coalition | 2 Brahmin + 20+ SC/ST + Muslim TVK MLAs (dtnext analysis) | Mentioned; not modelled |
How we caught this
Four parallel audit agents (one on methodology code, one on TN SC/ST ground-truth, one on external analyst commentary, one on name-classifier datasets) plus user pushback. The user spotted the SC error immediately — "the seats contested in SC are totally wrong build an agent team and then see how bad we behaved" — which triggered the full audit.
Process lessons (for next analysis)
- Stress-test every claim against external sources before publishing, not after.
- NULL-handling needs to be explicit —
fill_null(0)on a swing column is a silent bug. - Small samples need flagging in the page itself, not just method notes.
- An "entry_type" or "comparable_both_years" boolean column should accompany every cross-year metric.
- One reviewer (or one agent) reading the page critically before publish would have caught most of this.
What's still uncertain
- Religion correlation is still structurally weak by our methodology — needs sub-district data to fix.
- The 31 ACs in post-2011 new districts have no religion mix at all.
- We have no demographic data (age/gender/turnout-by-cohort) — the part every external analyst leads with.
- Booth-level patterns within ACs are entirely invisible (Path B not run).
If you spot another error, the data is all in S3 and the code is all in pipelines/. Open an issue, send a PR.