Dear Statalist,

I am working with a large dataset (30 mln obs, 1979-2018) and would like to collapse it to make computation easier. Here is an example of the dataset.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year byte(month statefip employ partt educ indcat female) int race byte marst
1979 1 1 1 0 2 3 0 100 1
1979 1 1 1 0 2 1 0 100 1
1979 1 1 1 0 2 2 0 100 1
1979 1 1 1 0 2 2 0 100 6
1979 1 1 1 0 2 2 1 100 1
1979 1 1 1 0 1 3 0 200 6
1979 1 1 1 0 1 1 0 100 7
1979 1 1 1 1 1 3 0 100 6
1979 1 1 1 0 2 3 0 100 1
1979 1 1 1 1 2 3 1 100 1
1979 1 1 1 0 1 2 1 100 7
1979 1 1 1 1 3 3 0 100 6
1979 1 1 1 0 2 1 0 100 1
1979 1 1 1 0 2 2 0 100 6
1979 1 1 1 0 1 5 1 100 1
1979 1 1 1 1 3 5 1 100 1
1979 1 1 1 1 2 3 1 100 1
1979 1 1 1 0 3 3 0 100 1
1979 1 1 1 0 2 2 0 100 1
1979 1 1 1 0 2 3 0 100 1
1979 1 1 1 0 3 3 0 100 1
1979 1 1 1 0 2 2 1 100 1
1979 1 1 1 0 2 1 0 100 1
1979 1 1 1 0 3 5 0 100 1
1979 1 1 1 0 2 2 0 100 1
1979 1 1 1 0 1 4 0 100 1
1979 1 1 0 . 3 5 1 200 1
1979 1 1 1 0 3 2 0 100 1
1979 1 1 1 0 3 3 0 100 1
end
label values month month_lbl
label def month_lbl 1 "January", modify
label values statefip statefip_lbl
label def statefip_lbl 1 "Alabama", modify
label values employ elabel
label values partt ptlabel
label values educ education
label def education 1 "Less than High School", modify
label def education 2 "High School", modify
label def education 3 "At least some college", modify
label values indcat indlabel
label def indlabel 0 "NIU", modify
label def indlabel 1 "Agriculture, forestry, fishing, mining, construction", modify
label def indlabel 2 "Manufacturing", modify
label def indlabel 3 "Transportation, communication, utilities, wholesale, retail trade", modify
label def indlabel 4 "Finance, insurance, real estate, business, repair, personal services", modify
label def indlabel 5 "Entertainment and recreation, professional and related services, public administration, active duty military", modify
label values female femalelab
label values race race_lbl
label def race_lbl 100 "White", modify
label def race_lbl 200 "Black/Negro", modify
label def race_lbl 700 "Other (single) race, n.e.c.", modify
label values marst marst_lbl
label def marst_lbl 1 "Married, spouse present", modify
label def marst_lbl 2 "Married, spouse absent", modify
label def marst_lbl 6 "Never married/single", modify
label def marst_lbl 7 "Widowed or Divorced", modify
Now, I collapsed using survey weights
Code:
collapse empstat labforce employ partt [fw=wtfinlm], by(year month statefip agegroup indcat female educ race marst)
.Unfortunately, it seems to me that the collapsed dataset tends to overestimate when I run my regression. So I would like to try and collapse using weights which take into account the class numerosity for my dependent vars (employ and partt). I tried something like this, without success
Code:
collapse employ partt n=employ[n=employ partt], by(year month statefip agegroup indcat female educ race marst)
.

Any help would be greatly appreciated.
Thanks!
IM