Hi all,
I have a database management problem that feels like it should be solvable, but I am just not getting it, and am hoping one of you might be able to help. I am using Stata 16.1 on Windows 10.
My problem is a special case of dealing with duplicate data in a database that is a chronological list of hospitalizations over a period of several years. The variables of interest are Admit date/time, discharge date/time, patient ID (string), admitting service (string), and discharge diagnosis (string).
Unfortunately, each row does not always represent a single hospitalization - there are times when a single hospitalization is actually represented by two rows (a so-called “conversion”): for the first part (#1), Service always takes on “Obs,” and for the second part (#2), Service always takes on “Medical” (however, not all instances of “Obs” or “Medical” represent conversions). The patient ID will be the same for #1 and #2 in a conversion. The database also contains repeat hospitalizations for the same patient over time; these appear as new rows later on with the same patient ID. The only way to differentiate conversions from repeat hospitalizations is that for the former, the discharge date/time of #1 is always within 5 minutes of the admission date/time of #2; readmissions never are.
Here is an example. The first row is a single hospitalization, the second and third rows are a converted hospitalization of a different patient, and the fourth row is a repeat hospitalization of the original patient:
Admit date/time Discharge date/time ID Service Diagnosis
01/01/2017 16:48:49 01/03/2017 15:35:45 001 Medical J11
02/01/2017 00:01:01 02/01/2017 12:45:45 002 Obs A10
02/01/2017 12:45:50 02/07/2017 15:54:32 002 Medical K45
03/10/2017 13:00:00 03/17/2017 18:45:44 001 Obs C23
I need to generate a new observation to represent each converted hospitalization, and it needs to contain elements of both observations that describe it:
Admit date/time from #1
Discharge date/time from #2
Service from #2
Diagnosis from #2
Finally, every time this is performed, I need to delete the two observations that created it, in order to avoid double-counting. Alternatively, if coming up with the code to delete observations is not straightforward, I could use indicator variables to avoid duplication.
So, in sum, what I would like to do is convert the above database to this database below:
Admit date/time Discharge date/time ID Service Diagnosis
01/01/2017 16:48:49 01/03/2017 15:35:45 001 Medical J11
02/01/2017 00:01:01 02/07/2017 15:54:32 002 Medical K45
03/10/2017 13:00:00 03/17/2017 18:45:44 001 Obs C23
Any help would be greatly appreciated – thanks to all for your insight.
Rahul
Related Posts with Help needed creating & deleting observations
Double Selection in one equationIf double selection in one equation exists, which command do I have to use? I studied Heckman twost…
One sided t-test between two regression coefficients (H0: b1 >= b2)Hi, How do I perform an one sided t-test between two regression coefficients? Of course, it should …
Generating a table that shows multiple ANOVA resultsDear Statalist, I want to generate a table that neatly summarizes a set of ANOVA results, just like…
Choice of covariates in estimating pscore and balancing propertyI am trying to find out the treatment effect using propensity score matching. I have a query on the …
Constraint BiProbitI am struggling with putting a constraint in my biprobit model. I am not even sure if the constraint…
Subscribe to:
Post Comments (Atom)
0 Response to Help needed creating & deleting observations
Post a Comment