Hi Statalist,
I am new her but I have learned a lot from you so far, so thank you all.
I am using STATA IC 15.1 and the problem I am facing is as follows:
There are 17 variables e.g. n_1_0 to n_3_2 and the values in these variables like this: number ranged from 1001 -99999 and (.).
The code I want is:
1044 for the cases
Missing (.) values for controls
Any other numbers excludes 1044 and 99999 and . will represents other diseases group
The data look like this:
id n_1_0 n_1_2 n_1_3 n_1_4 n_1_5
1 - - - - -
2 1022 1075 - - -
3 - - 99999 - -
4 - 1044 - 1044 --
5 1044 - - - 1006
6 - - 1044 - -
7 1010 - - 1044 -
etc.
Now I have coded the cases just fine. The code is
gen status = .
replace status = 1 if n_1_0==1044 | n_1_2==1044 | n_1_3==1044 | n_1_4==1044 | n_1_5==1044 <<< any time number 1044 recorded that's why I used | (OR)
and I got 4,123 hits
similar to controls:
replace status=2 if n_1_0==. & n_1_2==. & n_1_3==. & n_1_4==. & n_1_5==. <<< It has to be missing in all variables to be control, that's why I used & (AND)
and I got 457,300
Here the problem arise every time I try to code for other diseases. And the code I used is:
replace status = 3 if n_1_0 >=1001 & n_1_0 <99999 & n_1_0 !=1044 and repeat it for other variables.
What happen after this command is that the number of cases reduced to 3,745 and I think the issue comes from examples id 4 where number 1044 occur twice and id 5 where there is different number such as 1044 and 1006 and the number 1044 comes first and vice versa in id 7.
I hope anyone help me with this problem and what is the best way to solve it as I am going to deal with much larger data sets like this.
Thanks!
Related Posts with Coding for three groups - cases, controls , and other diseases - What a dilemma!
What are the advantage of adding interactive variable over subsampling in drawing conclusions?There are two main advantages of adding interaction variables over subsampling are: (1) having highe…
How to identify parents not co-residing with their children using household dataDear all, I need to create a variable which identifies, for each individual, the following statuses:…
Generate variable of ratios of consecutive observations, by subgroupHello. I am working with a panel dataset as characterized below. Code: * Example generated by -da…
How to obtain ML (Cox-Snell)R2 after using seemly unrelated regression (sureg)Dear forum, I have a basic doubts of methodology for which many of you may know the answer. I am u…
xtivreg2 warning that covariance matrix is not of full rank. Detecting singletons?Dear all, I am running the following code: . xtset id year . xi: xtivreg2 `t' (`z'=instrument_p) …
Subscribe to:
Post Comments (Atom)
0 Response to Coding for three groups - cases, controls , and other diseases - What a dilemma!
Post a Comment