Hello,

I am working with quarterly program participation data for a panel of individuals. I have individual (uniqueid) and quarter identifiers (qdate). I also have quarterly indicators for program participation (program) and program case IDs (casid).

Program case IDs are missing for the quarters in which the person is not on the program. I would like to interpolate the caseid variable for these missing quarters, so I have a household identifier for every quarter. If every observation only had a single case ID, I could easily accomplish this with stripolate using the groupwise option. The challenge is case IDs change over time for some individuals. I am seeking a way to interpolate these case IDs that can handle the fact that they are not constant within uniqueid. The new interpolated variable should look something like the newcaseid variable in the data below, where missing case IDs are filled in with the nearest non-missing value:
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long uniqueid float(qdate program) str9(caseid newcaseid)
16000006 224 1 "224455"    "224455"   
16000006 225 0 ""          "224455"   
16000006 226 1 "224455"    "224455"   
16000006 227 1 "224455"    "224455"   
16000006 228 1 "113736445" "113736445"
16000006 229 1 "113736445" "113736445"
16000006 230 1 "113736445" "113736445"
16000006 231 1 "113736445" "113736445"
16000006 232 1 "113736445" "113736445"
16000006 233 1 "113736445" "113736445"
16000006 234 1 "113736445" "113736445"
16000006 235 1 "113736445" "113736445"
16000006 236 0 ""          "113736445"
16000006 237 0 ""          "113736445"
16000006 238 0 ""          "113736445"
16000006 239 0 ""          "113736445"
16000011 224 0 ""          "118575161"
16000011 225 0 ""          "118575161"
16000011 226 0 ""          "118575161"
16000011 227 0 ""          "118575161"
16000011 228 0 ""          "118575161"
16000011 229 0 ""          "118575161"
16000011 230 0 ""          "118575161"
16000011 231 0 ""          "118575161"
16000011 232 0 ""          "118575161"
16000011 233 0 ""          "118575161"
16000011 234 0 ""          "118575161"
16000011 235 0 ""          "118575161"
16000011 236 1 "118575161" "118575161"
16000011 237 1 "118575161" "118575161"
16000011 238 0 ""          "118575161"
16000011 239 0 ""          "118575161"
16000016 224 1 "1048706"   "1048706"  
16000016 225 1 "1048706"   "1048706"  
16000016 226 1 "1048706"   "1048706"  
16000016 227 1 "1048706"   "1048706"  
16000016 228 1 "115221908" "115221908"
16000016 229 1 "115221908" "115221908"
16000016 230 1 "115221908" "115221908"
16000016 231 1 "115221908" "115221908"
16000016 232 1 "115221908" "115221908"
16000016 233 1 "115221908" "115221908"
16000016 234 1 "115221908" "115221908"
16000016 235 1 "115221908" "115221908"
16000016 236 1 "115221908" "115221908"
16000016 237 1 "115221908" "115221908"
16000016 238 1 "115221908" "115221908"
16000016 239 1 "115221908" "115221908"
end
format %tq qdate
I am thinking a way around this is to use the forward and backward options of stripolate. I have tried the following approach:

First, I interpolate using the backward option:

Code:
bysort uniqueid: stripolate caseid qdate, gen(forward_id) forward
This new variables looks pretty good but leaves some case IDs missing. I believe these need to be filled using the backward option. So I create a new variable using the backward option that I then use to fill any remaining missing case IDs.
Code:
bysort uniqueid: stripolate caseid qdate, gen(backward_id) backward
gen newcaseid_test = forward_id
replace newcaseid_test = backward_id if missing(forward_id)
This new variable (newcaseid_test) matches the variable I was looking for (newcaseid) but I can't help but feel uneasy about the fact that I had to use forward and backward options to create two separate variables that I then combined. I am wondering if this is an appropriate use of stripolate or if I am unwittingly introducing problems. I assume there is a good reason why stripolate separates the forward and backward options.

One issue I can foresee is that this approach might create different values for my new case ID variable depending on which option (forward or backward) I use first (provided there are two different values just before and after a missing value). I am not sure this is particularly problematic for me because either approach will provide me with a household identifier for each quarter.

Are there other issues, perhaps more fatal, that I am missing?

Thank you