Simulating data for logistic regression with categorical variables

Dear Statalists,

I would like to test the effect of sample size on standard errors of interaction effects. The model results from the survey data shows a pattern of interaction effects but the interaction effects do not reach statistical significance. I am interested to find out how large the sample size needs to be in order to be statistically significant.

The idea is to generate a new data set with the same distribution, correlation matrix, regression coefficients as the real data, but a larger sample size where the interaction effects of interest reach statistical significance.

I may need to consider complex survey design as well.

I have considered the following options.

1.The command

Code:

corr2data

would have been ideal if it worked well with logistic regression and categorical variables.

2. Following Buis' s discussion(i.e., M.L. Buis (2007) "Stata tip 48: Discrete uses for uniform()), I was able to simulate a data set for logistic regression with specified distributions, but failed to replicate regression coefficients. The regression coefficients in the simulated data set only approximate those specified. I cannot reproduce the correlation matrix either.

The approach is something similar to this post. https://www.stata.com/statalist/arch.../msg00018.html

3. A very vague idea is to use probit regression. I may simulate a data using corr2data and transform the outcome variables using the probit link functions. It is a long shot and I have not been able to figure out how to do it yet.

Any suggestion is appreciated.

The logistic regression results I wish to simulate:

Code:

. svyset psu  [pw=xw], strata(strata) singleunit(scaled)

      pweight: xw
          VCE: linearized
  Single unit: scaled
     Strata 1: strata
         SU 1: psu
        FPC 1: <zero>

. svy: logit y i.x1##i.x2 i.x3 c.x4
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =     1,630                  Number of obs     =      7,355
Number of PSUs     =     3,232                  Population size   = 7,976.7239
                                                Design df         =      1,602
                                                F(  13,   1590)   =      12.36
                                                Prob > F          =     0.0000

------------------------------------------------------------------------------
             |             Linearized
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        1.x1 |   .3586926   .1260393     2.85   0.004     .1114734    .6059119
             |
          x2 |
          3  |   .0491903   .1609868     0.31   0.760    -.2665767    .3649573
          4  |   .1623764   .1383986     1.17   0.241     -.109085    .4338377
          7  |   .0937721   .1003796     0.93   0.350    -.1031171    .2906613
             |
       x1#x2 |
        1 3  |   .0348744   .2815246     0.12   0.901    -.5173208    .5870697
        1 4  |   .3241205   .2281414     1.42   0.156    -.1233666    .7716076
        1 7  |  -.0562443   .2055027    -0.27   0.784    -.4593267    .3468381
             |
          x3 |
          1  |    .109137   .1153704     0.95   0.344    -.1171559    .3354298
          2  |   .4191621   .1126151     3.72   0.000     .1982738    .6400505
          3  |   .4574391   .1259686     3.63   0.000     .2103585    .7045197
          4  |   .8478119   .1286161     6.59   0.000     .5955382    1.100085
          5  |   1.051244   .1458429     7.21   0.000     .7651811    1.337307
             |
          x4 |   .1644617   .0746806     2.20   0.028     .0179798    .3109435
       _cons |    -1.4128   .6050882    -2.33   0.020    -2.599648   -.2259526
------------------------------------------------------------------------------
Note: Variance scaled to handle strata with a single sampling unit.

I use Stata 15, Windows 64bit.

Many thanks.
Min

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Simulating data for logistic regression with categorical variables
Simulating data for logistic regression with categorical variables

0 Response to Simulating data for logistic regression with categorical variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Simulating data for logistic regression with categorical variables Simulating data for logistic regression with categorical variables

Related Posts with Simulating data for logistic regression with categorical variables

0 Response to Simulating data for logistic regression with categorical variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Simulating data for logistic regression with categorical variables
Simulating data for logistic regression with categorical variables