As I used the eight-year survey data, I redefined the sampling frame into a single unit first. In the original sampling frame, 24 provinces were grouped into 19 and each primary sampling unit (PSU), primarily village, was divided into urban and rural. The sample was allocated proportionally among the 38 provincial groups among the 38 provincial groups (provG). In the first stage, PSUs were defined independently in each provincial group. Then one enumeration area (EA) was selected from PSU by simple random sampling as the second stage, and finally 10 households were selected by systematic sampling.
I redefined the variable “clusters” with "year" and "PSU" variables, and “strata” with the "year" and the "provG (provincial group)" variables, as shown below:
egen clusters=group(year PSU), label
egen strata=group(year provG), label
svyset clusters [pweight=hhweight], strata(strata) vce(linearized) singleunit(centered) || EA || hhid
As you probably know, you must use the prefix "svy:" before a command if you would like to maintain the redefined survey frame. However, I am trying to develop a predictive model by using Lasso regression, and "lasso" is not supported by the prefix “svy:.” Here are three questions:
- Would it be a problem if I develop the model without “svy:” prefix, as shown below?
- Does the use of importance weight, as shown below, solve the problem?
3. Is there any other solution?
Thank you very much in advance.
Best regards,
Haruyo
0 Response to Lasso regression with a pooled complex household survey data
Post a Comment