It is my first time writing a post in the forum, so I am not sure I can use dataex. As a result, I posted the link to download dataset:
HTML Code:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3WZFK9
Based on this raw data, I am trying to measure the number of close fights in each district by calculating the vote margin between candidiates.

Code:
egen candvote=sum(vote), by(sab year month day ddez sen cand etype)
egen totalvote=sum(vote), by(sab year month day sen ddez etype)
gen voteper=(candvote/totalvote)*100
So "sab"=State name, "ddez": District numbers, "sen":If the senate or representative elections, "cand":The name of candidates, and "etype":The type of election. I only take general elections, and drop all irrelevant election results, such as primary elections or the first-round elections that open runoff elections. Since this raw data also differentiate county votes, I combined county votes into the district-level variable "totalvote." As a result, one district that has multiple counties have the same value in these two variables, candvote and totalvote.

Code:
bysort sab year month day sen dno ddez etype cand: gen dup1 = cond(_N==1,0,_n)
drop if dup1>1
bysort sab year month day etype sen dno ddez: egen vote3=max(voteper) if outcome=="l"
bysort sab year month day etype sen dno ddez: egen vote4=min(voteper) if outcome=="w"
egen vote5=rowmax(vote3 vote4)
bysort sab year month day sen dno ddez etype vote5: gen dup5 = cond(_N==1,0,_n)
drop if dup5>1
As I mentioned, I eliminate the same number of votes since the vote share is the same across different counties. To calculate the vote margin in the district, I drop the duplicates and only keep one row for one candidate. If the district is a multi-member district, I also drop the duplicates and only two rows exist in each district - the winner who got the smallest vote, and the loser who got the largest vote.

Code:
bysort sab year month day etype sen dno ddez: gen vvvv=abs(vote5[_n+1]-vote5) if outcome != outcome[_n+1]
bysort sab year month day etype sen dno ddez: replace vvvv=vote5 if ddez !=ddez[_n+1] & ddez !=ddez[_n-1]
This is the hardest part. To calculate the difference (the vote margin), I take this syntax. If there is a competition between two candidates, the first syntax calculates the difference between two candidates. Since sometimes the row order is loser-winner, or winner-loser, I take the absolute value to see the difference of the vote. If there is no competition, I just transfer the vote share without any calculation from vote5 variable to vvvv variable.

Code:
bysort sab year: egen number=count(vvvv) if vvvv>=0
bysort sab year: egen number2=count(vvvv) if vvvv<=10
bysort sab year: egen number3=count(vvvv) if vvvv<=10 & partyt=="d"
bysort sab year: egen number4=count(vvvv) if vvvv<=10 & partyt=="r"
This syntax shows the percentage of "close elections" in each state and year. the variable number shows the number of competitions (i.e. 1986 Alaska election or 2014 Wyoming election). And the variable number2 shows the number of close elections in each year and state. (partyt variable shows whether the candidate is Democrat or Republican).

It is a long story to explain. But I am not sure my logic is proper to measure the level of close elections across U.S. state legislation. Since I dropped duplicates that seem irrelevant, the variable might not reflect the total number of close elections based on my intention. Do you think this syntax is ok to move on, or are there any other types of measurements to calculate the level of close elections? Really my second, third, and fourth parts of syntax measure the number of close elections at the district and state level?