Hi,
I am working with election data and have two datasets (concerned variables explained below):
Dataset 1: Election data
I have the following variables
'e_year' - election year
'l_year' - last election year
'year_b' - years since last election (ranges from 1-5)
Dataset 2: Variables like population, literacy etc. I have created lags of these variable for past 5 years using a for loop
foreach v of varlist farea -ftotal {
replace `v' = 0 if missing(`v')
by state: gen lag1_`v' = `v'[_n-1]
by state: gen lag2_`v' = `v'[_n-2]
by state: gen lag3_`v' = `v'[_n-3]
by state: gen lag4_`v' = `v'[_n-4]
by state: gen lag5_`v' = `v'[_n-5]
}
What I need: When I combine these datasets, I need the lags only for year_b i.e. if election happened 3 years ago then 3 lags, if it happened a year ago then only one lag and so on.
I have a very long way of implementing this. At the moment, I combine the datasets using state and e_year. Then if year_b is 1, I drop each lag for each variable. I tried dropping them using 'lag2_*' but (a) it doesn't work (b) this is still not the efficient way of doing it as I will need to repeat this for each lag
I am certain a better way to implement this exists. I would appreciate if someone can help me with it.
I would want to make the changes after implementing (as I have need my dataset 2 to be combines with another file too)
Thank you
Anna
0 Response to Efficiently creating lags only for certain years
Post a Comment