Generating new variables containing summary statistics with 'importance' weights?

Dear Statalist,

I have microdata on individuals, where I have assigned those individuals to geographical locations on a probabilistic basis. In other words, in some cases I know with 100% certainty that individual i is in location z. But in other cases, I might know that there is an 80% likelihood she is in location z, and 20% that she is in location x.

I have thus generated n copies of individuals i, where n is the number of locations in which i might be located. Each iteration of i has a weight variable a, reflecting the likelihood of being in that particular location (ie a=0.8 or 0.2 or 1).

What I am trying to do now is generate some new variables that contain location-level summary statistics on certain economic variables like wages and rents. In other words, for each individual i I know their annual wages and their rents, and I am trying to build location-specific mean and median wages and rents.

Here is a snippet of my data to make thing concrete. In terms of variables, serial is the household identifier; pernum is the person identifier, czone is the location identifier; afact is the probability of being in czone; rent is monthly rent and wage is self-explanatory.

Code:

serial    pernum    czone    afact    rent    wage
10        1         11600    1          35    1200
11        3        21600    .4866168    10    1370
11        3        26001    .0246062    10    1370
11        3        26002    .1607224    10    1370
11        3        26003    .0144012    10    1370
11        3        26004    .0839739    10    1370
11        3        26701    .2296794    10    1370

So in this case, person 1 in household 10 has a 100 percent chance of being in czone 11600. Whereas person 3 in household 11 could be in 6 different locations. I'm basically ignoring the household level for the moment - it just helps uniquely identify individuals.

What I am struggling with is how to incorporate the probability weight. A person who has only a 20% chance of being in a location and another who has a 80% of chance should not contribute equally to mean or median wages of that location.

I started with the collapse command, but realized it cannot handle weights for means or medians. Plus I'm uncertain how the weights I have fit into the standard Stata weight categories.

What is the right way to do what I want?

Thanks in advance for helping me think this through.

Tom

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Generating new variables containing summary statistics with 'importance' weights?
Generating new variables containing summary statistics with 'importance' weights?

0 Response to Generating new variables containing summary statistics with 'importance' weights?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating new variables containing summary statistics with 'importance' weights? Generating new variables containing summary statistics with 'importance' weights?

Related Posts with Generating new variables containing summary statistics with 'importance' weights?

0 Response to Generating new variables containing summary statistics with 'importance' weights?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating new variables containing summary statistics with 'importance' weights?
Generating new variables containing summary statistics with 'importance' weights?