Dear Statalist,
I’ve got a dataset with >850,000 observations of individuals with 132 dummy variables referring to months when the individuals had income, named di1, di2 …di132 (for dummy income).
I want to establish their eligibility for a child leave benefit. They are eligible if they had income for at least 9 months over the 24 months before the month of birth of their child, the months of income don’t have to be consecutive. Thus, the eligibility is to be assessed based on the 24 months preceding the month of birth of their child, and so different across observations. The month of birth is in a separate variable, with values ranging from 1 to 132 (the variable is called b_ren in the data example below).
So for each observation, I need to identify the appropriate di variable (equal to month of birth), sum the preceding 24 di variables in a new variable and see whether the sum is >= 9. The first month I am interested in (for further eligibility reasons) is 25 (i.e. I am not interested in the first two years). So, for example, if the month of birth is 25, the new variable will be the sum of di1 – di24.
I have considered reshaping the dataset, however, I believe it’s too large.
Any help would be much appreciated
Zuzana
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 rcm float(b_ren di1 di25 di130 di131 di132)
"R002813464" 73 0 0 1 1 1
"R002813464" 34 0 0 1 1 1
"R002813466" 38 1 0 1 1 1
"R002813466" 59 1 0 1 1 1
"R002813467" 30 0 1 0 1 1
"R002813467" 92 0 1 0 1 1
Related Posts with Generate moving sum of 24 variables in dataset with > 800,000 observations
system error messageI have STATA/MP 14.0. I have around 7500 firm year observations. I was running my regressions normal…
matrcrename on the ssc server (?) to rename individual columns/rows of a matrixBack in 2006 Nick Cox provided an ado file to rename individual columns/rows of a matrix in this arc…
metandi - save outputHi all, I'm using metandi to run multiple meta-analyses and I want to save the summary points (i.e. …
meologit var(_cons) too small, problem?I ran an meologit for my cluster RCT data; the clustering var(_cons) comes out to be 9.90e-32, S.E 5…
Difference-in-difference using tobit regression with mediation (controlling for fixed effects)Dear Statalist-users, I want to conduct a mediation analysis in a difference-in-difference setting.…
Subscribe to:
Post Comments (Atom)
0 Response to Generate moving sum of 24 variables in dataset with > 800,000 observations
Post a Comment