Working with Stata 15 on Mac OS Sierra, I'm attempting to work with Oklahoma voting data (available by request at https://virs.okelections.us/) to create a dataset which contains one observation for every voter since 2000, with the variables voterid (which uniquely identifies each voter) and one variable for each election in Oklahoma since 2000.
Initially, the data is in the form of a long dataset, with only voterid and electiondate for each observation. There are thus 2 variables, and ~23 million observations. voterid repeats if the voter has voted in multiple elections. In order to reshape the data so that each electiondate is its own variable, I used the following code:
tab(electiondate), gen(election_)
Because there are so many electiondates, I am left with approx 240 variables for the election_* variables generated by the previous command. Each observation now contains the voterid variable, and 240 other variables with all but one containing a value of 0 and the other containing a value of 1 for each observation. I attemped to then collapse this data so that there is just one observation per voterid, with a value of 1 for every election a person voted in, and 0 for every election they didn't:
fcollapse (max) election_*, by(voterid)
This takes too long and my computed runs out of memory. Instead, I would like to drop each variable that has fewer than 20,000 1 values, to make the collapsing process quicker. I recoded all the 0 values to .a in the hopes of using:
foreach var of election_*{
drop `var' if (count `var')<20000
}
however this returns invalid syntax.
Is there a way to drop a variable if the number of non-missing values is below a certain threshold? Alternatively, is there a better way of accomplishing my final goal of collapsing the data so that I have one observation per voterid with different variables for most elections (even if I have to drop several smaller elections)?
Related Posts with Dropping variables by number of non-missing values
regress with panel data Code: * Example generated by -dataex-. To install: ssc install dataex clear input str7 ym byte hs_c…
Tabulating estimates of endogenous variables in dynamic forecast modelsHi I am running some dynamic forecast models, each of which contains 7 endogenous variables. Since…
asroprobit may estimate the variables that are not defined in the code?Hello, When I followed the tutorial to learn the asroprobit command in the Stata software, I typed …
Combining surveys with distinct analytical weightsHi. I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted …
eivreg in rolling window setting (estimates not stored properly)Dear Statalist, I might be shooting my shot with this post but I want to implement the eivreg comma…
Subscribe to:
Post Comments (Atom)
0 Response to Dropping variables by number of non-missing values
Post a Comment