I'm running Stata 15.1 on a Mac OS and am currently working with Pew panel data. I believe my question is very basic. I'd like to measure the relationship between a continuous independent variable and an ordinal dependent variable (note: there are other variables whose relationships I'm interested in, but I will use the current case as an example). One (the x or independent variable) was measured in the April 2020 wave of the survey, and the other (the dependent variable) was measured in the October 2020 wave. Because my dataset also consists of variables measured in other waves, I opted to reshape the data to 'wide' format. However, I noticed that model test statistics are larger in regressions of data in 'long' than 'wide' format:
Long Format
Code:
. ologit AF_GOOD4 mhindex_meanZ, or Iteration 0: log likelihood = -27851.383 Iteration 1: log likelihood = -27270.49 Iteration 2: log likelihood = -27268.927 Iteration 3: log likelihood = -27268.927 Ordered logistic regression Number of obs = 23,538 LR chi2(1) = 1164.91 Prob > chi2 = 0.0000 Log likelihood = -27268.927 Pseudo R2 = 0.0209 AF_GOOD4 Odds Ratio Std. Err. z P>z [95% Conf. Interval] mhindex_meanZ 1.547024 .019962 33.82 0.000 1.50839 1.586648 /cut1 -.0222594 .0132795 -.0482869 .003768 /cut2 .9936055 .014874 .9644529 1.022758 /cut3 2.788631 .0274565 2.734817 2.842444
Code:
. ologit AF_GOOD4 mhindex6466_meanZ, or Iteration 0: log likelihood = -9283.7942 Iteration 1: log likelihood = -9090.1634 Iteration 2: log likelihood = -9089.6425 Iteration 3: log likelihood = -9089.6424 Ordered logistic regression Number of obs = 7,846 LR chi2(1) = 388.30 Prob > chi2 = 0.0000 Log likelihood = -9089.6424 Pseudo R2 = 0.0209 AF_GOOD4 Odds Ratio Std. Err. z P>z [95% Conf. Interval] mhindex6466_meanZ 1.547043 .0345765 19.52 0.000 1.480737 1.616317 /cut1 -.0222594 .0230008 -.0673403 .0228214 /cut2 .9936055 .0257626 .9431117 1.044099 /cut3 2.788631 .047556 2.695422 2.881839
Of course, this is not surprising given that 'wide' format includes multiple measurements (at different waves) of the same variable from each respondent. But my question is whether the inflated test statistics can be trusted. As more control variables are added, it's possible that variables that remain significant in 'long' format are no longer significant in 'wide' format. I'm thus not sure how to approach this issue. Am I better off sticking to wide format? Is there a way to obtain 'adjusted' test statistics in long format? Or perhaps I'm perceiving a problem that really isn't a problem (?).
Any input you can provide will be much appreciated. Thank you!
0 Response to Regressions with 'long'/panel data: misleading test statistics?
Post a Comment