Schema reference — S3 layout
All structured data is in Hetzner Object Storage at s3://tnelection2026/. Total: ~2.2 GB across 9 dataset folders.
results/
ECI 2026 candidate-level + round-wise. Already in S3 for all 234 ACs.
results/curated/year=2026/ac={N}/candidate_totals.parquet
| Column | Type | Source |
|---|---|---|
sn | int | ECI table row number |
candidate | str | candidate full name |
party | str | party name (ECI spelling) |
evm_votes | int | EVM tally |
postal_votes | int | postal/ETPBS tally |
total_votes | int | evm + postal |
pct_votes | float | percent of valid votes in AC |
state | str | "TN" |
ac_no | int | 1..234 |
ac_name | str | e.g. "KOLATHUR" |
year | int | 2026 |
run_date | str | fetch date |
results/curated/year=2026/ac={N}/round_votes.parquet
| Column | Type |
|---|---|
round_no | int |
candidate | str |
party | str |
brought_forward | int (running total at start of round) |
current_round | int (votes this round) |
total | int (running total at end of round) |
state, ac_no, ac_name, year, run_date | metadata |
historical/
2021 + 2016 + 2011 TN AE results. Used as the swing baseline.
historical/curated/year=2021/kracekumar_detailed.parquet
4,232 rows — one per (AC, candidate) for the 2021 TN AE.
| Column | Type |
|---|---|
sn | int (O.S.N. in source) |
Candidate | str |
Party | str |
evm_votes, postal_votes, total_votes | int |
pct_votes | float |
ac_name | str |
ac_no | int |
district_name | str |
constituency_type | str ("General" / "SC" / "ST") |
position | int (rank, 1 = winner) |
Gender | str ("M" / "F") |
demographics/
Census 2011 religion mix at district level.
demographics/curated/district_religion.parquet
32 rows — one per TN district as of Census 2011.
| Column | Type |
|---|---|
district_code | str (Census 6-digit MDDS code) |
district_name | str |
Total_persons | int |
Hindu_persons, Hindu_pct | int, float |
Muslim_persons, Muslim_pct | int, float |
Christian_persons, Christian_pct | int, float |
Sikh_*, Buddhist_*, Jain_*, Other_*, NotStated_* | int, float |
demographics/curated/ac_demographics.parquet
234 rows — each AC joined to its district's religion mix.
| Column | Source |
|---|---|
ac_no, ac_name, district_name, constituency_type | 2021 results |
district_code, Hindu_pct, Muslim_pct, Christian_pct, Sikh_pct, ... | Census 2011 |
dist_norm | normalised district name (used for the join) |
Gap
31 of 234 ACs have null religion fields. Those are in districts created after Census 2011 (Tirupathur, Mayiladuthurai, Ranipet, Chengalpattu, Kallakurichi, Tenkasi).
insights/
insights/curated/swing_2021_to_2026.parquet
3,393 rows — one per (AC × party) for parties contesting in 2021 or 2026.
| Column | Type |
|---|---|
ac_no | int |
ac_name, district_name, constituency_type | from 2021 |
party_norm | str — normalised to |
pct_2021 | float (null if party didn't contest in 2021, e.g. TVK) |
pct_2026 | float (null if party didn't contest in 2026) |
swing_pct | float = pct_2026 − pct_2021 (with nulls treated as 0) |
votes_2021, votes_2026, total_2021, total_2026 | int |
Other folders
| Folder | Purpose | Status |
|---|---|---|
form20/ | Booth-level PDFs + (partial) booth × candidate parquets | partial — 12 ACs of 234 |
voters/ | Voter-roll PDFs + OCR'd voter lists | partial — 10 booths in Kolathur |
candidates/ | MyNeta candidate data | deprioritized |
caste/ | Reservation + Wikipedia prose per AC | 234 ACs (sparse) |
religion/ | State-baseline religion per AC | 234 ACs (state-level only) |
alliance/ | Party→alliance map per year | 4 years (2026, 2021, 2016, 2011) |
geo/ | DataMeet shapefile + parquet WKT | 234 polygons |
Quick DuckDB query
sql
INSTALL httpfs; LOAD httpfs;
SET s3_endpoint='hel1.your-objectstorage.com';
SET s3_access_key_id='...';
SET s3_secret_access_key='...';
-- Most-swung TVK ACs by district
SELECT ac_no, ac_name, district_name, swing_pct
FROM read_parquet('s3://tnelection2026/insights/curated/swing_2021_to_2026.parquet')
WHERE party_norm = 'TVK'
ORDER BY swing_pct DESC
LIMIT 10;