Dear Statalist,

I just want to check if this is ok or if there is a more elegant way to tidy up a messy dataset. Stata IC15 user.

Each person has a questionnaire they can update whenever they want. I am trying to find the closest questionnaire update to a target date that is different for each patient

I first reshaped wide so one line per patient and found there were up to 17 updates of the questionnaires per person, some with only 1 update.

I gen a dummy interval (interval) = 10,000 days and then a variable (closest_update) for the right questionnaire update =- -9

I then calculate the interval between each questionnaire update and the target date

then replace the interval with the observed interval for 0/17 if it is less than the previous value and use that to replace the value of the closest_value

interval = 100000

gen correct_update = -9

forvalues i = 0/17 {
gen interval`i' = abs(target_date- date_of_update`i') if date_of_update`i'~=.
replace closest_update = `i' if abs(interval`i') < interval
replace interval = interval`i' if closest_version ==`i'
}

Also is there a way to adapt this to find the closest pair of 2 questionnaires that are updated at different times rather than a fixed date per patient. Would I need to generate the interval between each update of questionnaire a with each version of questionnaire b and then sort them?

Hope this makes sense.

Kassim