Dropping variables by number of non-missing values

Working with Stata 15 on Mac OS Sierra, I'm attempting to work with Oklahoma voting data (available by request at https://virs.okelections.us/) to create a dataset which contains one observation for every voter since 2000, with the variables voterid (which uniquely identifies each voter) and one variable for each election in Oklahoma since 2000.

Initially, the data is in the form of a long dataset, with only voterid and electiondate for each observation. There are thus 2 variables, and ~23 million observations. voterid repeats if the voter has voted in multiple elections. In order to reshape the data so that each electiondate is its own variable, I used the following code:

tab(electiondate), gen(election_)

Because there are so many electiondates, I am left with approx 240 variables for the election_* variables generated by the previous command. Each observation now contains the voterid variable, and 240 other variables with all but one containing a value of 0 and the other containing a value of 1 for each observation. I attemped to then collapse this data so that there is just one observation per voterid, with a value of 1 for every election a person voted in, and 0 for every election they didn't:

fcollapse (max) election_*, by(voterid)

This takes too long and my computed runs out of memory. Instead, I would like to drop each variable that has fewer than 20,000 1 values, to make the collapsing process quicker. I recoded all the 0 values to .a in the hopes of using:

foreach var of election_*{
drop `var' if (count `var')<20000
}

however this returns invalid syntax.

Is there a way to drop a variable if the number of non-missing values is below a certain threshold? Alternatively, is there a better way of accomplishing my final goal of collapsing the data so that I have one observation per voterid with different variables for most elections (even if I have to drop several smaller elections)?

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Dropping variables by number of non-missing values
Dropping variables by number of non-missing values

0 Response to Dropping variables by number of non-missing values

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Dropping variables by number of non-missing values Dropping variables by number of non-missing values

Related Posts with Dropping variables by number of non-missing values

0 Response to Dropping variables by number of non-missing values

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Dropping variables by number of non-missing values
Dropping variables by number of non-missing values