Multiple lineal regression / Log transformed variables interpretation

Hi.
I am using a database with information on food preservation methods (such as "frozen", "canned", expressed in tertiles of consumption in grams/day) and their effect on different variables (leukocytes, CRP, ...- continuous variables). I have difficulty selecting what is the appropriate model for this.

1. If dependent variables are kept as continuous variables, should the model be a multiple regression for each food preservation method and dependent variables?
For example:

Code:

 regress leukocyte i.cannedtertile

+ other explanatory variables

Code:

 regress crp i.cannedtertile

+ " " "

Code:

 regress crp i.frozentertile

+ " " "

2. Most dependent variables are not normally distributed. For example, the continuous variable "leukocytes" (measured in 10^3 / mm3) does not have a normal distribution, so I have transformed it logarithmically.

Code:

gen logleukocyte = log(leukocyte)

a) I have found that the interpretation should be done like this: exponentiate the coefficient, subtract 1 and multiply by 100 (https://kenbenoit.net/assets/courses...logmodels2.pdf and https://stats.idre.ucla.edu/other/mu...g-transformed/).

Code:

regress logleukocyte b1.cannedtertil

Code:

  
-------------------------------------------------------------------------------
       logleukocyte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
cannedtertil |
           2  |  -.0365994   .0171835    -2.13   0.033    -.0702951   -.0029037
           3  |  -.0152055   .0171048    -0.89   0.374    -.0487469    .0183359
    _cons |   1.784896   .0121318   147.13   0.000     1.761107    1.808686

So the first coefficient could be interpreted as:
-Coefficient = -.0365994
-Exponentiate: 0.9641
-Substract 1: -0.0394
-Result = -3,594
So: "compared to the lowest tertile, those in the second canned food consumption tertile have 3,59 10^3/mm3 less leukocytes"
--> Is this correct?

b) However, how would the confidence interval be interpreted?
I have read in this post (https://www.stata.com/stata-news/news34-2/spotlight/) that it is preferable to use log transform and linear regression or Poisson regression followed by the use of the "margins" command, so that the confidence interval is also on the original scale (given that: "It is tempting to simply exponentiate the predictions to convert them back to wages, but the reverse transformation results in a biased prediction (see references Abrevaya [2002]; Cameron and Trivedi [2010]; Duan [1983]; Wooldridge [2010]).")
c) If the above is correct, is it correct ot use it:

Code:

gsem logleukocyte <-  b1.cannedtertil
-------------------------------------------------------------------------------
              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
logleukocyte        |
cannedtertil |
           2  |  -.0365994   .0171659    -2.13   0.033     -.070244   -.0029548
           3  |  -.0152055   .0170873    -0.89   0.374     -.048696     .018285
        _cons |   1.784896   .0121194   147.28   0.000     1.761143     1.80865
--------------+----------------------------------------------------------------
 var(e.leulog)|   .0716775   .0020492                      .0677717    .0758085

Code:

margins, expression(exp(predict(eta))*(exp((_b[/var(e.logleukocyte)])/2)))

Code:

margins, expression(exp(predict(eta))*(exp((_b[/var(e.logleukocyte)])/2))) at(cannedtertile=(1(1)3))
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   6.176397   .0751213    82.22   0.000     6.029162    6.323632
          2  |   5.954431   .0726437    81.97   0.000     5.812052     6.09681
------------------------------------------------------------------------------

...like this?

--> Also, how would the result be interpreted (6.176397 and 5.954431)? (this is not the same as obtained above: 3.59 10 ^ 3 / mm3)

c) If not, would you recommend the use of the Poisson model + margins (second option explained here: https://www.stata.com/stata-news/news34-2/spotlight/)? (I have used it too and similar results appear - coefficients around 5.- and 6.- and I don't know how to interpret them).

3. If I had to use the value of p, would I use the one obtained in the multiple linear regression with the transformed variables?

I would really appreciate your help.

Thank you in advance.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Multiple lineal regression / Log transformed variables interpretation
Multiple lineal regression / Log transformed variables interpretation

0 Response to Multiple lineal regression / Log transformed variables interpretation

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Multiple lineal regression / Log transformed variables interpretation Multiple lineal regression / Log transformed variables interpretation

Related Posts with Multiple lineal regression / Log transformed variables interpretation

0 Response to Multiple lineal regression / Log transformed variables interpretation

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Multiple lineal regression / Log transformed variables interpretation
Multiple lineal regression / Log transformed variables interpretation