Thank you all in advance for taking the time to read this post and help me.
I am doing a research project focused on risk factors for stroke in a subset of patients. For doing so, I gathered retrospective data from a cohort of patients and organised it into" assessments". The first assessment is the first time they started medical follow up and in each assessment different tests and explorations were carried out. The first problem is that not all patients have the same number of assessments, not all patients have the same number of explorations per assessments and, therefore, there is a lot of longitudinal missing data in some areas.
Once the data was collected, we organised the next steps in this way:
1. Descriptive analysis of the cohort to see important variables that might be considered as risk factors
2. Set up panel data from the longitudinal evolution of patients with the important variables and do a univariate logistic regression to pick up variables for a multivariate logistic regression
3. Generate a multivariate regression model with the variables that were significant in the previous step.
When we did the descriptive analysis we picked up some variables that changed across different assessments and others that did not. In the end, I assembled the data into a panel data with the variable id identifying each individual, time for identifying the assessment number (1,2...) and the outcome variable stroketotal (0/1) Besides, there are the variables: sex for gender, renal function (gfr), mean age at the assessment (it is mean age because some tests were done at different times and I had to do a mean age for each assessment), presence of an autoimmune disease (autoinmunity 0/1) and presence or not of a specific mutation (N215S 0/1), presence of white matter lesions in the MRI (wml 0/1/2/3) and the degree of enlargement of the left atrium (LAecho 0/1/2/3)
It is a panel data of 414 patients, some of them had the stroke before the first assessment (about 40) and about 40 more had a stroke at the end of the study period. I limited the panel data to 7 assessments and, therefore, I ended up with around 2800 data rows.
It looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int id byte time float(stroketotallong_ meanageass_) int gfr_ float(wml_ LAecho_ autoinmunitynum N215S) byte sex 1 1 0 34 97 1 . 0 0 1 1 2 0 35 . . . 0 0 1 1 3 0 36 . . . 0 0 1 1 4 0 37 . . . 0 0 1 1 5 0 38 . . . 0 0 1 1 6 0 39 . . . 0 0 1 1 7 0 . . . . 0 0 1 2 1 0 27 98 0 1 0 0 1 2 2 0 28 96 0 . 0 0 1 2 3 0 29 98 . . 0 0 1 2 4 0 30 104 . . 0 0 1 2 5 0 31 103 . . 0 0 1 2 6 0 32 105 . . 0 0 1 2 7 0 33 102 . . 0 0 1 4 1 0 40 112 0 0 0 0 1 4 2 0 41 102 . 1 0 0 1 4 3 0 42 103 . . 0 0 1 4 4 0 43 . . . 0 0 1 4 5 0 44 . . . 0 0 1 4 6 0 45 . . . 0 0 1 4 7 0 . . . . 0 0 1 5 1 0 41 83 1 . 0 0 1 5 2 1 . . . . 0 0 1 5 3 1 . . . . 0 0 1 5 4 1 . . . . 0 0 1 5 5 1 . . . . 0 0 1 5 6 1 . . . . 0 0 1 5 7 1 . . . . 0 0 1 6 1 0 32 94 0 2 0 1 0 6 2 0 34 93 0 . 0 1 0 6 3 0 35 103 . . 0 1 0 6 4 0 33 79 . . 0 1 0 6 5 0 36 63 . . 0 1 0 6 6 0 37 48 . . 0 1 0 6 7 0 41 46 . . 0 1 0 7 1 0 61 106 . 0 0 1 1 7 2 0 58 98 . . 0 1 1 7 3 0 59 100 . . 0 1 1 7 4 0 62 102 . . 0 1 1 7 5 0 63 111 . . 0 1 1 7 6 0 64 93 . . 0 1 1 7 7 0 67 99 . . 0 1 1 8 1 0 20 115 0 1 0 0 0 8 2 0 21 110 . . 0 0 0 8 3 0 . . . . 0 0 0 8 4 0 . . . . 0 0 0 8 5 0 . . . . 0 0 0 8 6 0 . . . . 0 0 0 8 7 0 . . . . 0 0 0 9 1 0 40 92 1 0 0 0 1 9 2 0 42 80 1 . 0 0 1 9 3 0 44 84 . . 0 0 1 9 4 0 45 . . . 0 0 1 9 5 0 46 . . . 0 0 1 9 6 0 47 . . . 0 0 1 9 7 0 . . . . 0 0 1 10 1 0 21 117 0 0 0 0 1 10 2 0 23 121 0 0 0 0 1 10 3 0 25 115 . . 0 0 1 10 4 0 26 129 . . 0 0 1 10 5 0 27 121 . . 0 0 1 10 6 0 28 106 . . 0 0 1 10 7 0 29 98 . . 0 0 1 11 1 0 25 131 0 0 0 0 1 11 2 0 26 120 0 1 0 0 1 11 3 0 28 132 . . 0 0 1 11 4 0 29 124 . . 0 0 1 11 5 0 30 97 . . 0 0 1 11 6 0 31 125 . . 0 0 1 11 7 0 32 106 . . 0 0 1 13 1 0 38 121 0 . 0 1 1 13 2 0 39 124 . . 0 1 1 13 3 0 41 106 . . 0 1 1 13 4 0 42 105 . . 0 1 1 13 5 0 43 114 . . 0 1 1 13 6 0 44 108 . . 0 1 1 13 7 0 . . . . 0 1 1 14 1 1 42 95 0 0 0 0 1 14 2 1 43 89 0 0 0 0 1 14 3 1 44 107 . 0 0 0 1 14 4 1 45 105 . . 0 0 1 14 5 1 41 109 . . 0 0 1 14 6 1 46 103 . . 0 0 1 14 7 1 47 109 . . 0 0 1 15 1 0 35 108 0 0 0 0 1 15 2 0 36 108 0 0 0 0 1 15 3 0 37 102 . 0 0 0 1 15 4 0 38 113 . 0 0 0 1 15 5 0 39 105 . . 0 0 1 15 6 0 40 104 . . 0 0 1 15 7 0 41 108 . . 0 0 1 16 1 0 66 110 0 0 0 1 0 16 2 0 67 97 0 0 0 1 0 16 3 0 69 103 . . 0 1 0 16 4 0 70 93 . . 0 1 0 16 5 0 71 . . . 0 1 0 16 6 0 72 . . . 0 1 0 16 7 0 . . . . 0 1 0 17 1 1 60 76 1 0 0 0 1 17 2 1 61 77 1 0 0 0 1 end label values autoinmunitynum Zero label values N215S Zero label def Zero 0 "No", modify label def Zero 1 "Yes", modify label values sex Sex label def Sex 0 "Male", modify label def Sex 1 "Female", modify
For example: xtlogit comparison of GFR and stroke in a model where age and gender is also added or N215S and stroke with the same other variables
Code:
xtset id xtlogit stroketotallong_ i.sex c.meanageass_ c.gfr_, or Random-effects logistic regression Number of obs = 1,838 Group variable: id Number of groups = 390 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 4.7 max = 7 Integration method: mvaghermite Integration pts. = 12 Wald chi2(3) = 38.56 Log likelihood = -297.21479 Prob > chi2 = 0.0000 ---------------------------------------------------------------------------------- stroketotallong_ | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- sex | Female | .3959993 .4896565 -0.75 0.454 .0350895 4.46902 meanageass_ | 1.416943 .0838234 5.89 0.000 1.261819 1.591137 gfr_ | .9731659 .0225645 -1.17 0.241 .9299302 1.018412 _cons | 5.24e-13 1.87e-12 -7.90 0.000 4.72e-16 5.81e-10 -----------------+---------------------------------------------------------------- /lnsig2u | 5.151202 .1560547 4.845341 5.457064 -----------------+---------------------------------------------------------------- sigma_u | 13.13921 1.025218 11.27593 15.3104 rho | .9813 .0028637 .974778 .9861595 ---------------------------------------------------------------------------------- Note: Estimates are transformed only in the first equation. Note: _cons estimates baseline odds (conditional on zero random effects). LR test of rho=0: chibar2(01) = 877.78 Prob >= chibar2 = 0.000 xtlogit stroketotallong_ i.sex c.meanageass_ i.N215S, or nolog Random-effects logistic regression Number of obs = 2,256 Group variable: id Number of groups = 409 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 5.5 max = 7 Integration method: mvaghermite Integration pts. = 12 Wald chi2(3) = 715.32 Log likelihood = -282.26214 Prob > chi2 = 0.0000 ---------------------------------------------------------------------------------- stroketotallong_ | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- sex | Female | .0002462 .0002721 -7.52 0.000 .0000282 .0021489 meanageass_ | 2.382608 .0859723 24.06 0.000 2.219926 2.557212 | N215S | Yes | 4.65e-11 8.67e-11 -12.76 0.000 1.20e-12 1.80e-09 _cons | 8.09e-24 1.68e-23 -25.64 0.000 1.39e-25 4.71e-22 -----------------+---------------------------------------------------------------- /lnsig2u | 6.104662 .1448829 5.820697 6.388627 -----------------+---------------------------------------------------------------- sigma_u | 21.16462 1.533196 18.36319 24.39342 rho | .9927091 .0010486 .990338 .9945016 ---------------------------------------------------------------------------------- Note: Estimates are transformed only in the first equation. Note: _cons estimates baseline odds (conditional on zero random effects). LR test of rho=0: chibar2(01) = 1205.64 Prob >= chibar2 = 0.000
If I do a calculation of OR without a panel data analysis
Code:
tab stroketotallong_ N215S stroketota | N215S llong_ | No Yes | Total -----------+----------------------+---------- 0 | 1,586 875 | 2,461 1 | 367 70 | 437 -----------+----------------------+---------- Total | 1,953 945 | 2,898 logit stroketotallong_ i.sex c.meanageass_ i.N215S, or nolog Logistic regression Number of obs = 2,256 LR chi2(3) = 180.27 Prob > chi2 = 0.0000 Log likelihood = -885.08092 Pseudo R2 = 0.0924 ---------------------------------------------------------------------------------- stroketotallong_ | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- sex | Female | .6185883 .0776087 -3.83 0.000 .4837367 .7910324 meanageass_ | 1.043193 .0042768 10.31 0.000 1.034845 1.051609 | N215S | Yes | .2324415 .0380368 -8.92 0.000 .1686641 .3203353 _cons | .0425143 .0095265 -14.09 0.000 .0274031 .0659583 ----------------------------------------------------------------------------------
Thank you all very much for your help.
David.
0 Response to Panel data regression weird OR
Post a Comment