My data is indexed by COVID-19 case IDs in my country. The variables I am interested in are when the case's earliest test (eRT) was sampled versus when it was logged to a database of interest (the difference being 'delay'). Each case has info on the patient's name and demographics, as well as test type, which lab ordered and/or processed it, what city the lab is in, etc.
I am interested in verifying if there are significant differences between cities and labs in terms of delays, as defined. This is what I am running after cleaning up my data a bit.
Code:
cap gen eRT_createdAt_cont=date(eRT_createdAt,"YMD") cap gen eRT_sampleCollectedDate_cont=date(eRT_sampleCollectedDate,"MDY") cap gen delay = eRT_createdAt_cont - eRT_sampleCollectedDate_cont replace delay = . if delay < 0 | delay > 5 encode eRT_processedByEntity_city, gen(city) encode eRT_processedByEntity_name, gen(lab) keep if eRT_testType == "Molecular" by lab, sort: drop if _N < 500 fvset base 365 lab asdoc reg delay i.lab, label
I am having a bit of trouble making sure this is the way to go. The above regression give very significant results, as does the one with cities without the need to drop observations. If this is indeed the way to go, I am having trouble envisioning how to generate this variable in Stata, create two columns of unique sampleCollectedDate, Lab pairs, and merge the variable with that table.
I am also open to exploring delays being, instead, the mean for a week; daily data may be noisy, I figure.
0 Response to "Reshaping" data to panel data
Post a Comment