Skip to content

Data sources

Every row in every parquet on this site is traceable to a publicly accessible source. License-respectful; attribution per row where the data permits.

Primary sources used in this analysis (Path A)

1. 2026 results — ECI

  • URL: https://results.eci.gov.in/ResultAcGenMay2026/
  • What: HTML pages — ConstituencywiseS22{ac}.htm (final tallies) and RoundwiseS22{ac}.htm (per-round detail), one per AC.
  • Method: Custom-UA scrape (curl/8.4.0 — ECI 403s default UAs). See pipelines/results.py.
  • Granularity: AC × candidate × round.
  • Cost: Free, public domain.
  • In S3: s3://tnelection2026/results/raw/year=2026/ac=*/ (HTML) + curated/.../candidate_totals.parquet and round_votes.parquet.

2. 2021 results — kracekumar/tn_elections

  • URL: https://github.com/kracekumar/tn_elections/blob/master/2021_detailed_results.csv
  • What: Pre-extracted CSV of all 234 TN AE 2021 results at candidate level.
  • Method: Direct download.
  • Granularity: AC × candidate.
  • Cost: Free, open repo (no license stated but data is public-domain ECI scrape).
  • In S3: s3://tnelection2026/historical/raw/year=2021/kracekumar_detailed_results.csv + parquet.

3. Religion mix — Census of India 2011

  • URL: https://censusindia.gov.in/nada/index.php/catalog/11392/download/14505/DDW33C-01%20MDDS.XLS
  • What: C-01 "Population by Religious Community" for Tamil Nadu (state code 33), sub-district granularity.
  • Method: Direct XLS download (requires verify_ssl=False — Census site has a known cert issue).
  • Granularity: Sub-district (tehsil); we aggregate to district for the join.
  • Cost: Free, public domain.
  • In S3: s3://tnelection2026/demographics/raw/census2011/DDW33C-01_TN_religion.xls + curated/district_religion.parquet.

4. Reservation status — derived from ECI AC names

  • ECI tags reserved seats with (SC) or (ST) in the AC name string (e.g. "Ponneri (SC)").
  • Confidence: high. Official source.

Other sources we evaluated but didn't use here

Click to expand
SourceWhy consideredWhy not used here
TCPD Lokdhaba (Ashoka)AC-level results 1962-2021, free CSV2026 not yet ingested (typical 6-18 month lag); 2021 covered by kracekumar
SHRUG v2.1 (DevDataLab)Sub-district demographics with built-in shrid→AC mappingAuth-walled (registration required); we used coarser district fallback
Dataful.in TN 1967-2026Pre-joined historical CSVLogin-walled
MyNeta TN 2026Candidate assets / criminal casesUser deprioritized candidate-level data
Form 20 PDFs (TN CEO)Booth-level vote talliesImage PDFs, would require OCR
Electoral rolls (TN CEO)Voter names per boothCAPTCHA + image PDF, requires OCR
Susewind Nature 2025Booth-level results 2009-2019 (11 states incl. TN)Pre-2026, valuable for future swing baselines
Lokniti CSDS post-pollSurvey-level caste-by-vote crosstabsTN 2026 not yet released
SECC 2011Caste detail at sub-districtCaste names never publicly released
Grey-market voter list resellersBooth-level + caste-tagged voter CSVsDPDPA 2023 grey zone; ~Rs 50K per AC

For the full Buy-vs-DIY matrix see pipelines/SOURCES_BUY_VS_DIY.md.

Cost summary

ResourceMoney spent
Hetzner Object Storagecovered by existing bucket subscription
2captcha (used for ~10 booth PDFs during Path B prototyping)~$0.01
Cloud OCR$0 (not used for Path A)
Commercial voter data$0 (didn't go grey-market)
Total Path A cost$0

Licensing

  • ECI data: public domain (Government of India).
  • Census 2011: public domain.
  • kracekumar/tn_elections: no explicit license stated; data is public-domain underlying. We treat as MIT-equivalent.
  • This site's analysis + code: MIT (when committed). All derived parquets attribute their source per row.

Built from public data — ECI, Census 2011, kracekumar/tn_elections.