Regressions with 'long'/panel data: misleading test statistics?

Greetings,

I'm running Stata 15.1 on a Mac OS and am currently working with Pew panel data. I believe my question is very basic. I'd like to measure the relationship between a continuous independent variable and an ordinal dependent variable (note: there are other variables whose relationships I'm interested in, but I will use the current case as an example). One (the x or independent variable) was measured in the April 2020 wave of the survey, and the other (the dependent variable) was measured in the October 2020 wave. Because my dataset also consists of variables measured in other waves, I opted to reshape the data to 'wide' format. However, I noticed that model test statistics are larger in regressions of data in 'long' than 'wide' format:

Long Format

Code:

. ologit AF_GOOD4  mhindex_meanZ, or

Iteration 0:   log likelihood = -27851.383  
Iteration 1:   log likelihood =  -27270.49  
Iteration 2:   log likelihood = -27268.927  
Iteration 3:   log likelihood = -27268.927  

Ordered logistic regression    Number of obs     =    23,538
    LR chi2(1)        =    1164.91
    Prob > chi2       =    0.0000
Log likelihood = -27268.927    Pseudo R2         =    0.0209

        
AF_GOOD4  Odds Ratio   Std. Err.      z    P>z     [95% Conf.    Interval]
        
mhindex_meanZ    1.547024    .019962    33.82    0.000      1.50839    1.586648
        
/cut1   -.0222594   .0132795    -.0482869    .003768
/cut2    .9936055    .014874    .9644529    1.022758
/cut3    2.788631   .0274565    2.734817    2.842444

Wide Format

Code:

. ologit AF_GOOD4  mhindex6466_meanZ, or

Iteration 0:   log likelihood = -9283.7942     
Iteration 1:   log likelihood = -9090.1634     
Iteration 2:   log likelihood = -9089.6425     
Iteration 3:   log likelihood = -9089.6424     

Ordered logistic regression    Number of obs    =      7,846
    LR chi2(1)    =     388.30
    Prob > chi2    =     0.0000
Log likelihood = -9089.6424    Pseudo R2    =     0.0209

        
AF_GOOD4  Odds Ratio   Std. Err.    z    P>z    [95% Conf. Interval]
        
mhindex6466_meanZ    1.547043   .0345765    19.52   0.000    1.480737    1.616317
        
/cut1   -.0222594   .0230008        -.0673403    .0228214
/cut2    .9936055   .0257626        .9431117    1.044099
/cut3    2.788631    .047556        2.695422    2.881839

Of course, this is not surprising given that 'wide' format includes multiple measurements (at different waves) of the same variable from each respondent. But my question is whether the inflated test statistics can be trusted. As more control variables are added, it's possible that variables that remain significant in 'long' format are no longer significant in 'wide' format. I'm thus not sure how to approach this issue. Am I better off sticking to wide format? Is there a way to obtain 'adjusted' test statistics in long format? Or perhaps I'm perceiving a problem that really isn't a problem (?).

Any input you can provide will be much appreciated. Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Regressions with 'long'/panel data: misleading test statistics?
Regressions with 'long'/panel data: misleading test statistics?

0 Response to Regressions with 'long'/panel data: misleading test statistics?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Regressions with 'long'/panel data: misleading test statistics? Regressions with 'long'/panel data: misleading test statistics?

Related Posts with Regressions with 'long'/panel data: misleading test statistics?

0 Response to Regressions with 'long'/panel data: misleading test statistics?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Regressions with 'long'/panel data: misleading test statistics?
Regressions with 'long'/panel data: misleading test statistics?