Dear Statalist,

I’ve got a dataset with >850,000 observations of individuals with 132 dummy variables referring to months when the individuals had income, named di1, di2 …di132 (for dummy income).
I want to establish their eligibility for a child leave benefit. They are eligible if they had income for at least 9 months over the 24 months before the month of birth of their child, the months of income don’t have to be consecutive. Thus, the eligibility is to be assessed based on the 24 months preceding the month of birth of their child, and so different across observations. The month of birth is in a separate variable, with values ranging from 1 to 132 (the variable is called b_ren in the data example below).

So for each observation, I need to identify the appropriate di variable (equal to month of birth), sum the preceding 24 di variables in a new variable and see whether the sum is >= 9. The first month I am interested in (for further eligibility reasons) is 25 (i.e. I am not interested in the first two years). So, for example, if the month of birth is 25, the new variable will be the sum of di1 – di24.

I have considered reshaping the dataset, however, I believe it’s too large.

Any help would be much appreciated
Zuzana

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 rcm float(b_ren di1 di25 di130 di131 di132)
"R002813464" 73 0 0 1 1 1
"R002813464" 34 0 0 1 1 1
"R002813466" 38 1 0 1 1 1
"R002813466" 59 1 0 1 1 1
"R002813467" 30 0 1 0 1 1
"R002813467" 92 0 1 0 1 1