In the stylized example below, the wages_bkt and retirement_bkt are the bracketed versions of wages and retirement, respectively. A bracket code of 1 corresponds to a $0 to $1,000 bracket range and 2 to over $1,000. I've already calculated bracket codes for the nonmissing responses.
For the observations missing values, I need to take a random draw from the valid values that fall within the bracket range specified for that observation and income type, e.g. for the household with $1,500 in wages and missing retirement income, I need to randomly draw a retirment income value from the valid retirement income reports that fall in bracket 2.
In my scratch code below, I:
- Rank the nonmissing values within each bracket range (I eventually plan to generate a random value and calculate within bracket rank by that number)
- Generate variables with the total number of valid responses from which we can draw for wages and retirement
- Draw a random number between 1 and the (bracket-level) number of valid reports, for observations with missing reports
Does anyone have any ideas how I might pull the valid values from the observations identified by the *_pull_from and *_bkt variables into the observations with invalid values in the example below? Or does anyone have an altogether different solution? I'm replicating an existing imputation procedure, so I'm constrained on the methodology. Thank you!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(wages retirement wages_bkt retirement_bkt) 500 1100 1 2 900 300 1 1 3000 600 2 1 2600 800 2 1 4600 1100 2 2 700 400 1 1 400 1100 1 2 1200 1700 2 2 1500 . 2 2 . . 2 1 . 1200 1 2 end bysort wages_bkt: egen w_rank = rank(wages) if !missing(wages), unique bysort retirement_bkt: egen r_rank = rank(retirement) if !missing(retirement), unique bysort wages_bkt: egen w_num_valid = max(w_rank) bysort retirement_bkt: egen r_num_valid = max(r_rank) set seed 123 gen w_pull_from = floor(runiform()*w_num_valid) + 1 if missing(wages) gen r_pull_from = floor(runiform()*r_num_valid) + 1 if missing(retirement) *Results in: clear input int(wages retirement wages_bkt retirement_bkt w_rank r_rank w_num_valid r_num_valid w_pull_from r_pull_from) 3000 600 2 1 4 3 5 4 . . 2600 800 2 1 3 4 5 4 . . . . 2 1 . . 5 4 2 4 900 300 1 1 4 1 4 4 . . 700 400 1 1 3 2 4 4 . . 1200 1700 2 2 1 5 5 5 . . 1500 . 2 2 2 . 5 5 . 4 400 1100 1 2 1 1 4 5 . . 500 1100 1 2 2 3 4 5 . . . 1200 1 2 . 4 4 5 3 . 4600 1100 2 2 5 2 5 5 . . end
0 Response to Replacing missing values with draws from a subset of nonmissing values
Post a Comment