I'm working on a population of cancer patients and for that I have two datasets that I would like to combine.
However, I can't find the right approach. I've used merge before, but that doesn't quite solve my problem (unless there is something I'm missing).
Dataset 1 contains the variables shown below. I only have one observation per patient. They are grouped according to cancer type and they have the TNM classification used for staging the cancer at diagnosis.
id | cancer_gr | tnm_t | tnm_n | tnm_m |
1 | 2 | T2 | N1 | Mx |
2 | 5 | T4 | N0 | Mx |
3 | 8 | T1 | N0 | M0 |
4 | 11 | T3 | N1 | Mx |
In another dataset I have the cancer groups and the TNM classifications and stages for each constellation of the three TNM variables that will provide me with a stage. In this example it is for cancer group 2 (which e.g. could be breast cancer).
cancer_gr | tnm_t_stage | tnm_n_stage | tnm_m_stage | stage |
2 | T1 | N0 | Mx | 1 |
2 | T2 | N1 | Mx | 2 |
2 | T4 | N2 | M1 | 3 |
2 | T4 | N2 | M2 | 4 |
1: Merge dataset 2 into dataset 1, so that e.g. id 1 has (in this example) four observations. This way I can keep the row where the tnm_x matches the tnm_x_stage.How can this be done? Do I need to look further into the merge command or is there another way to do this efficiently?
2: Use a command that will only merge together the rows that are similar. This would give me one more variable, stage, in dataset 1. E.g. id 1 would be a stage 2 according to the tables.
(I work on a closed server and I'm not allowed to send anything home so dataex was not an option)
Thank you very much for the help.
0 Response to Merge based on several variables matching
Post a Comment