I have microdata on individuals, where I have assigned those individuals to geographical locations on a probabilistic basis. In other words, in some cases I know with 100% certainty that individual i is in location z. But in other cases, I might know that there is an 80% likelihood she is in location z, and 20% that she is in location x.
I have thus generated n copies of individuals i, where n is the number of locations in which i might be located. Each iteration of i has a weight variable a, reflecting the likelihood of being in that particular location (ie a=0.8 or 0.2 or 1).
What I am trying to do now is generate some new variables that contain location-level summary statistics on certain economic variables like wages and rents. In other words, for each individual i I know their annual wages and their rents, and I am trying to build location-specific mean and median wages and rents.
Here is a snippet of my data to make thing concrete. In terms of variables, serial is the household identifier; pernum is the person identifier, czone is the location identifier; afact is the probability of being in czone; rent is monthly rent and wage is self-explanatory.
Code:
serial pernum czone afact rent wage 10 1 11600 1 35 1200 11 3 21600 .4866168 10 1370 11 3 26001 .0246062 10 1370 11 3 26002 .1607224 10 1370 11 3 26003 .0144012 10 1370 11 3 26004 .0839739 10 1370 11 3 26701 .2296794 10 1370
What I am struggling with is how to incorporate the probability weight. A person who has only a 20% chance of being in a location and another who has a 80% of chance should not contribute equally to mean or median wages of that location.
I started with the collapse command, but realized it cannot handle weights for means or medians. Plus I'm uncertain how the weights I have fit into the standard Stata weight categories.
What is the right way to do what I want?
Thanks in advance for helping me think this through.
Tom
0 Response to Generating new variables containing summary statistics with 'importance' weights?
Post a Comment