Hi:
This is not a stata related question, so please forgive me if this is not allowed.
I am dealing with the NHIS database which has a complex survey design with clustering, stratification and oversampling of certain sub-population. The psu's are mainly counties or contiguous counties which are later stratified based on MSA status. The NHIS however only provide pseudo strata and pseudo psu codes for confidentiality reasons. For the survey period that I am interested in, there were 304 strata and 482 psu's. However, there are 300 pseudo-strata, each containing 2 pseudo psu's-so 600 pseudo psu's in total. My confusion stems from the fact that in the manual, they said that the pseudo psu's were constructed by collapsing the original psu's to create bigger clusters so that it would be more difficult to identify any given clusters. If that is the case then how come there are more pseudo psu's then the original ones?
I am trying to include some measure of area specific fixed effects in my panel regression and I was thinking of using the pseudo-psu's as a proxy for geographic area. It says in the above paper that, "a given geographic area within a given NHIS sample PSU should have the same set of Pseudo-Stratum and Pseudo-PSU codes assignments if it is present in more than one NHIS annual microdata file." Doesn't that imply that the original psu's are broken down into psudo-psu's which explains why there are more pseudo psu's than original ones? Then why does it say in the manual that the psu's are merged or collapsed?
I have attached a link to their manual.http://www.asasrms.org/Proceedings/y2007/Files/JSM2007-000353.pdf.
I would be really grateful if any kind soul could help me out!
Related Posts with Questions about pseudo-strata/psu in complex survey design
Specifying both crossed and nested random effects in "mixed"I have a hierarchical design where politicians where each asked 9 randomly chosen questions from a p…
xtmixed: stopping iteration when non-convergence is encounteredI'm running a simulation using -mixed- command. My simulation contains both random intercept and ran…
Using a loop with local and extended macros to generate new recoded variables, variable labels, and value labelsI am generating a series of new variables that represent recoded versions of existing variables. At …
Origin Dest Airline DataDear stata users, per year quarter I've information on origin and destination for the airlines in th…
How to handle the missing values generated by tssmooth exponential in unbalanced panel dataDear all, I have an unbalanced panel dataset. Panel variable is firm_id and time variable is year. …
Subscribe to:
Post Comments (Atom)
0 Response to Questions about pseudo-strata/psu in complex survey design
Post a Comment