Friday, March 31, 2023

Separating a character and a number part of a variable

I have a variable consisting of characters and numbers in a string format. How can I separate the character part of a variable and the numbers part? The positioning of a number part is not in the same sequence for all observations so that I couldn't use a substring option. Is there any general way to identify the number part and separate it?

specifying stored coefficients in test

I am running two multilevel models and storing the coefficients. I then want to test of the product of two coefficients, one from each model, is zero as a simple way to get at indirect effects/moderation. But I'm not sure how to indicate the stored coefficients. Here's my syntax

mixed lifesatis airsatis wtrsatis presenv LNhhincomeC female ageC tertedu eparttime unemployed nolabor marrdom divsepwid hlthprob LNcintenseC pctprotectC LNgdppcC meaneduC incgini19C wealthgini19C || WP5: airsatis wtrsatis presenv LNhhincomeC female ageC tertedu eparttime unemployed nolabor marrdom divsepwid hlthprob, reml

estimates store full_model

mixed airsatis LNhhincomeC female ageC tertedu eparttime unemployed nolabor marrdom divsepwid hlthprob LNcintenseC pctprotectC LNgdppcC meaneduC incgini19C wealthgini19C || WP5: LNhhincomeC female ageC tertedu eparttime unemployed nolabor marrdom divsepwid hlthprob, reml

estimates store airsat_model

Then when I try to test

test (([full_model]airsatis)*([airsat_model]LNcintenseC))=0
equation [full_model] not found
r(303);

Help appreciated!

How to use If condition to restrict t test on only two groups with a categorical var that has 3 groups

Hi Family,
I have 1 categorical variable that has 3 grps( blue,yellow, green), I wanted check if the variance between only these two groups( blue and green) differs. So how do i use if condition to constraint the categorical variable to work with only the 2 groups out of the 3.

Given that;
the value label for blue=1 yellow=2 and green=3
wait= Weight of individuals

Below is the command I used but stata still generate an error.


sdtest wait , by( grps ) if grps==1 & grps==3

option if not allowedr(198);

I need your help?

Thank you.
Amidu

Thursday, March 30, 2023

Make a list table in Stata

Dear Stata users,

Thank you in advance. I want to make a table that has all items listed in cells that are grouped by a categorical variable. Let me illustrate with data example and show you what I want to get.

You can use this data example:
Code:
sysuse auto
replace foreign=3 in 1/15
label define origin 3 Whatever, modify
sample 30
keep make foreign
tabulate foreign
Or you can use the following data directly:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str18 make byte foreign
"VW Diesel"        1
"Olds Cutlass"     0
"Cad. Eldorado"    3
"Audi 5000"        1
"Olds Cutl Supr"   0
"Datsun 210"       1
"Olds Omega"       0
"Fiat Strada"      1
"AMC Spirit"       3
"Toyota Corolla"   1
"Chev. Malibu"     0
"Pont. Firebird"   0
"Mazda GLC"        1
"Merc. Monarch"    0
"Honda Accord"     1
"Pont. Grand Prix" 0
"Olds 98"          0
"Volvo 260"        1
"Buick Opel"       3
"Toyota Celica"    1
"Subaru"           1
"Plym. Champ"      0
end
label values foreign origin
label def origin 0 "Domestic", modify
label def origin 1 "Foreign", modify
label def origin 3 "Whatever", modify
And what I want to get is a table like this:
Array

OLS vs. Ordered Logistic Regression

Hello,

I have five employee-level variables as below, and I was wondering if I could use the OLS with fixed effect or ordered logistic regression...

Z1 = employee wage

Z2 = employee tenure

Z3 = employee age

Y = employee perception of fairness (ordinal survey item with 5 = very fair... 1 = very unfair)

X = employee job satisfaction (ordinal survey item with 5 = very satisfied...1 = very unsatisfied).

My understanding is that with the two ordinal independent and dependent variables, I perhaps need to use the ordered logistic model. However, I was wondering if using OLS would be incorrect? Furthermore, would it be acceptable to incorporate firm-level fixed effect in the ordered logistic model?

Does STATA allow fixed effects with an ordered logistic model?

Thank you!

How to realize data deformation operation

How to perform the following data operations as follows:

(1) Place the data scattered in multiple rows on one row

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input str2(a b c d)
"a"  ""   ""   ""  
""   "b"  ""   ""  
""   ""   "c"  ""  
""   ""   ""   "d" 
"fd" ""   ""   ""  
""   "fe" ""   ""  
""   ""   "fe" ""  
""   ""   ""   "ef"
"de" ""   ""   ""  
""   "ss" ""   ""  
""   ""   ""   ""  
""   ""   ""   "fw"
"cc" ""   ""   ""  
""   "fd" ""   ""  
""   ""   "ef" ""  
""   ""   ""   "ge"
end
My goal is to transform the data into the following data

Array


(2) How to store the serial number of a specific value in a variable A, or how to store the value of another variable B corresponding to a specific value in a variable A for subsequent operations

The data are the same as aboveL

Array

We want to store the serial number of the variable A when it is not empty, that is, the value of the corresponding variable obs ( the value of the red in the figure ), in the temporary element for subsequent data operations.

Looking forward to your reply, looking forward to your help, Thank you very much,

Clarifying interpretation of marginal effects after reghdfe

Hello, I am looking for clarification on post-estimation margins command after reghdfe. I am running a pretty standard Diff-in-Diff (effect of a state-level insurance policy change on cancer screening rates in younger vs older individuals) using reghdfe, and want to confirm that I am using/interpreting the margins command correctly.

Code:
reghdfe screen i.age##i.post i.RACE i.EDUC , absorb(state_num year month) vce(cluster state_num)
(MWFE estimator converged in 4 iterations)
note: 1bn.post is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)

HDFE Linear regression                            Number of obs   =  2,056,819
Absorbing 3 HDFE groups                           F(  10,     14) =     776.20
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.0049
                                                  Adj R-squared   =     0.0049
                                                  Within R-sq.    =     0.0042
Number of clusters (state_num) =         15       Root MSE        =     0.2780

                                     (Std. Err. adjusted for 15 clusters in state_num)
--------------------------------------------------------------------------------------
                     |               Robust
              screen |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
                 age |
              older  |  -.0260963   .0022391   -11.65   0.000    -.0308987   -.0212939
              1.post |          0  (omitted)
                     |
            age#post |
            older#1  |   .0032939     .00066     4.99   0.000     .0018784    .0047094
                     |
                RACE |
          Black, NH  |   .0078715   .0007183    10.96   0.000     .0063309    .0094121
           Hispanic  |   .0222411    .001537    14.47   0.000     .0189446    .0255376
              Asian  |    .043212     .00239    18.08   0.000     .0380859    .0483381
                     |
                     |
                EDUC |
                 HS  |  -.0067054   .0008325    -8.05   0.000    -.0084909     -.00492
       SOME COLLEGE  |  -.0154122   .0009276   -16.61   0.000    -.0174018   -.0134226
  BACHELOR/GRADUATE  |  -.0364558   .0016695   -21.84   0.000    -.0400365   -.0328751
                     |
               _cons |   .0947413   .0005795   163.48   0.000     .0934983    .0959843
--------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   state_num |        15          15           0    *|
        year |         9           1           8     |
       month |        12           1          11     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
When I run the margins command (albeit with the 'noestimcheck' option, but I think is kosher for calculating marginal effects), it omits one category of individual. This because it is collinear with the fixed effects, yes?

Code:
. margins age, dydx(post) noestimcheck

Conditional marginal effects                    Number of obs     =  2,056,819
Model VCE    : Robust
--------------------------------------------------------------------------------
               |            Delta-method
               |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
       1.post  |
           age |
        older  |          0  (omitted)
      younger  |   .0032939     .00066     4.99   0.000     .0020004    .0045875
--------------------------------------------------------------------------------
When I do it a different way, it tells me that the predicted change for the older group is 0. This "0" predicted change is garbage, similarly because it is collinear with the fixed effects, yes?

Code:
. margins age#post, noestimcheck

Predictive margins                              Number of obs     =  2,056,819
Model VCE    : Robust

Expression   : Linear prediction, predict()
--------------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
            age#post |
            older#0  |   .0945361    .000978    96.66   0.000     .0926192     .096453
            older#1  |   .0945361    .000978    96.66   0.000     .0926192     .096453
          younger#0  |   .0684398   .0012712    53.84   0.000     .0659483    .0709314
          younger#1  |   .0717337   .0017257    41.57   0.000     .0683514    .0751161
--------------------------------------------------------------------------------------
So, do I understand correctly that there is no way to get the marginal effect of pre vs post for the older group? Thanks

Performing Regressions on Panel and Time Series Data

Good afternoon, first time poster here. I'm an undergraduate student trying to run some regressions on data I have been compiling. I thought this method of compiling would make it simple to regress, but it's proving to be more complicated than I thought.

I am in the process of trying to use daily market data to calculate the ANNUAL Betas by Firm then calculate the error term for each firm_year combination. Ultimately, I'm trying to use daily stock return data (regressed on/controlled by the Fama-French 3-factors**) to calculate the annual systemic (Error Term) and systematic (Beta) risk for each firm.

In order to do this, I have ~252 daily data points over 11 years (2011 - 2021), for 100 firms (this totals 277,200 data points). In the end, I hope to have one data point for every firm & year combination for each of the 100 firms (100 firms * 11 years = 1100 data points).

I have been able to run the regression using "sort" and "by" to perform one regression per year-firm combination and receive the appropriate annual beta (displayed below). However, calculating the error term for each of these points is proving to be much more complicated. It seems as if using the "predict y_hat" and "predict residuals" doesn't break the results into each individual firm_year subset.

Finally, if anybody has any advice on exporting the results into an excel or CSV file that contains the relevant Firm, Year, Systemic Risk, Systematic Risk compiled, that would be greatly appreciated (currently I'm planning on manually combining the datapoints).

Below I have listed an example of my variables, as well code I'm currently using:



Array


Code:

Array

At this point it should provide me the difference squared of the (predicted - actual)^2. If I'm not mistaken, I would then need to sum that variable for each year_firm to get the error term (Annual Systemic Risk) for that firm.

I hope I was clear enough in explaining my goals, and explain what I've been doing so far. I really appreciate your time and any support you may have to offer.

Best!

Automatically run a do file when opened

Hello all,

Is there a feature that will have Stata automatically run designated DO files when they are opened?

Thanks!

Deguza

Wednesday, March 29, 2023

Model fitness

Dear all,
Good morning!
How could I measure the model fittness of teffects ipwra (inverse probability weighted regression adjustment for impact analysis)?
Could you suggest me any command to execute the log likelihood test of the model or other fitness measures?

Thank you in advance for sharing your busy schedule.

Entity fixed effects model - repeated time values within panel

Hello, I want to run a regression in which the explained variable is the bond coupon and there are some explanatory variables, such as size of the issuance, tenor, etc. I want to know if one specific characteristic has a significant impact on the coupon rate, this characteristic will be identified with a dummy variable. My database, so it is not complete yet, so this variable is not there yet.

I thought the best way to run this regression was through an entity fixed effects model, controlilng both for issuers id and month of the issuance, something like: 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡. However, as you can see below, I have more than one time period for each issuer (one issuer issued more than one bond in the same month). So, when I tried to use 'xtset id month', I got the error message 'repeated time values within panel'. I know this is probably more of an lack of knowledge regarging which is the most appropriate model, but I would like to know if there is any advice you could give me?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 ticker double(coupon issuedate size) byte tenor float(mnth id logsize)
"ACCES"    15.5 21625.874675925923   33998957  5 21625  1  17.34184
"ADB"       4.4 18366.874675925923  367149939  2 18366  2  19.72128
"ADB"      7.18 18366.874675925923  195806943  3 18366  2  19.09264
"ADB"        .5 18533.874675925923   56103825  7 18533  2 17.842714
"ADB"        .5 18533.874675925923   30614167  7 18533  2 17.236973
"ADB"      4.35 18533.874675925923  104088416  4 18533  2  18.46075
"ADB"      7.02 18533.874675925923   15312514  4 18533  2 16.544182
"ADB"      3.14 19133.874675925923   14366736  4 19133  2 16.480427
"ADB"      2.81 19387.916342592594   28223921  4 19387  2  17.15568
"ADB"      3.41 19584.874675925923   30717117  4 19584  2  17.24033
"ADB"     2.125 20165.874675925923  5.000e+08 10 20165  2  20.03012
"ADB"      6.71 20221.874675925923    2338831  3 20221  2 14.665162
"ADB"         1 20681.874675925923  8.000e+08  3 20681  2  20.50012
"ADB"      1.75 20681.874675925923  5.000e+08 10 20681  2  20.03012
"ADB"     2.625 20907.874675925923   50000000 10 20907  2 17.727533
"ADB"         6 20962.874675925923   83070277  4 20962  2 18.235197
"ADB"      1.57 20963.874675925923   10200000  5 20963  2 16.137897
"ADB"         2 20963.874675925923   40564903  5 20963  2 17.518414
"ADB"      9.55 20963.874675925923   49547852  4 20963  2 17.718449
"ADB"     1.875 21040.874675925923  7.500e+08  5 21040  2 20.435583
"ADB"     2.375 21040.874675925923  5.000e+08 10 21040  2  20.03012
"ADB"      3.18 21137.916342592594   16889951 10 21137  2  16.64223
"ADB"      1.47 21209.916342592594   23864070 10 21209  2 16.987885
"ADB"      1.95 21264.874675925923   12903558  2 21264  2 16.373014
"ADB"      2.25 21303.874675925923   12903558  2 21303  2 16.373014
"ADB"       .37 21360.874675925923  334096983  5 21360  2  19.62694
"ADB"       .35 21380.874675925923  620880201  7 21380  2  20.24665
"ADB"     3.125 21452.874675925923  7.500e+08 10 21452  2 20.435583
"ADB"      2.45 21565.916342592594  749913861  5 21565  2  20.43547
"ADB"      .295 21737.874675925923  467735777  7 21737  2 19.963413
"ADB"       1.6 21809.874675925923  148631576 11 21809  2  18.81698
"ADB"       1.5 21825.874675925923   98006546  3 21825  2 18.400545
"ADB"      .625 21839.874675925923  296450889  7 21839  2 19.507393
"ADB"         0 21845.874675925923  776100251 10 21845  2  20.46979
"ADB"        .8 22117.874675925923   50000000 10 22117  2 17.727533
"ADB"       .29 22200.874675925923   47728140  8 22200  2 17.681032
"ADB"     10.12 22236.874675925923    8336391  2 22236  2  15.93614
"ADB"      10.1 22236.874675925923   21945745  2 22236  2 16.904083
"ADB"      .253 22300.874675925923  143184421  7 22300  2 18.779644
"ADB"      .603 22312.874675925923   36218011 30 22312  2 17.405067
"ADB"         6 22315.874675925923   75011720  5 22315  2 18.133156
"ADB"       .75 22320.874675925923  941477743  5 22320  2  20.66296
"ADB"      .715 22333.874675925923   41392013 30 22333  2 17.538599
"ADB"       .92 22334.874675925923   51740016 40 22334  2 17.761742
"ADB"     .8325 22342.874675925923   36218011 30 22342  2 17.405067
"ADB"       4.7 22350.874675925923   53843392  3 22350  2  17.80159
"ADB"      2.15 22532.874675925923  133606703 10 22532  2 18.710411
"ADB"       1.8 22532.874675925923  140524399 15 22532  2 18.760891
"ADB"        .6 22627.874675925923   38367096  2 22627  2 17.462711
"ADB"      14.5 22936.874675925923    7337761  2 22936  2 15.808544
"ADNA"    4.375 22530.874675925923  7.500e+08  3 22530  3 20.435583
"ADNA"    4.375 22530.874675925923  7.500e+08  3 22530  3 20.435583
"ADNARE"  4.625 21836.874675925923  3.625e+08 20 21836  4 19.708534
"ADNARE"  4.625 21836.874675925923  3.625e+08 20 21836  4 19.708534
"ADNGN"    6.25 21709.874675925923  5.000e+08  6 21709  5  20.03012
"ADNGN"    6.25 21709.874675925923  5.000e+08  6 21709  5  20.03012
"AESOC"  4.7133 21655.874675925923  120223159 10 21655  6  18.60486
"AESOC"  4.7133 21655.874675925923   33550867 10 21655  6 17.328573
"AFDB"     4.52 18349.874675925923   61384460  4 18349  7 17.932667
"AFDB"     4.68 18408.874675925923   56611583  4 18408  7 17.851725
"AFDB"     4.35 18408.874675925923   26820632  4 18408  7 17.104683
"AFDB"       .5 18470.874675925923    7135164  7 18470  7 15.780546
"AFDB"     4.42 18499.874675925923   11462412  3 18499  7 16.254583
"AFDB"       .5 18533.874675925923    7514248 10 18533  7 15.832312
"AFDB"       .5 18562.916342592594    6154319  6 18562  7 15.632665
"AFDB"     3.71 18562.916342592594   28560046  4 18562  7 17.167519
"AFDB"      4.8 18562.916342592594   45594931  4 18562  7 17.635307
"AFDB"      .75 19648.874675925923  5.000e+08  3 19648  7  20.03012
"AFDB"     1.75 19793.874675925923   89266585  5 19793  7 18.307138
"AFDB"    1.375 20438.916342592594  5.000e+08  3 20438  7  20.03012
"AFDB"     .375 20788.916342592594  114729422  6 20788  7 18.558086
"AFDB"      3.5 20802.916342592594   94583730 15 20802  7 18.364996
"AFDB"        3 21523.916342592594  5.000e+08  3 21523  7  20.03012
"AFDB"     .375 21649.874675925923  190912562  5 21649  7 19.067326
"AFDB"      .25 22391.874675925923   95456281  5 22391  7  18.37418
"AFDB"     .823 22685.874675925923  143184421  5 22685  7 18.779644
"AFFNN"   1.205 22187.874675925923  158881474  5 22187  8 18.883669
"AGBKC"    4.15 20380.916342592594   85178875  2 20380  9 18.260263
"AGBKC"    2.75 20380.916342592594  5.000e+08  5 20380  9  20.03012
"AGBKC"   2.125 20380.916342592594  4.000e+08  3 20380  9 19.806974
"AGBKC"     2.4 22937.874675925923 2071851821  3 22937  9  21.45171
"AGBKC"     2.4 22937.874675925923 2129471890  3 22937  9  21.47914
"AGBKC"     2.8 22937.874675925923  690617273  5 22937  9 20.353096
"AGBKC"     2.8 22937.874675925923  709823963  5 22937  9  20.38053
"AGBKCK"   2.25 22704.874675925923  3.000e+08  5 22704 10  19.51929
"AGBKCK"      2 22704.874675925923  6.000e+08  3 22704 10  20.21244
"AGBKFH"   3.68 21702.874675925923  483691533  3 21702 11  19.99696
"AGDBC"    3.79 20808.916342592594 1612305112  3 20808 12  21.20093
"AGDBC"    4.48 21138.916342592594 1612305112  2 21138 12  21.20093
"AGDBC"    3.18 21858.874675925923  348063375  3 21858 12 19.667894
"AGDBC"       2 22545.874675925923 1419647927  2 22545 12 21.073675
"AGSPA"      .5 22572.874675925923  103480033  5 22572 13  18.45489
"AGUA"     8.65 20997.874675925923  206675622 10 20997 14  19.14666
"AGUAS"     1.8 21257.874675925923   57039000  7 21257 15 17.859245
"AGUAS"     2.5 21622.874675925923   76052000 25 21622 15 18.146929
"AHWLIA"   3.99 21739.874675925923   85178875  3 21739 16 18.260263
"AHWLIA"    3.8 22822.874675925923   88018171  3 22822 16 18.293055
"AIAUT"    1.75 22656.874675925923  1.000e+09  5 22656 17 20.723267
"AIAUT"    1.75 22656.874675925923  1.000e+09  5 22656 17 20.723267
"AJBAN"     4.6 21502.916342592594   41437036  3 21502 18 17.539686
end
format %tdnn/dd/CCYY issuedate
Thanks a lot!

Plot survival curve of the interaction in Stratified Cox Model

Dear all,
I ran a Cox model with strata(year),
stcox X1 X2 c.X1#i.X2, strata(year) vce(robust)

X1 is a continuous variable, X2 is a category variable (=1 or 0), now I want to plot the survival and hazard curves to show the effects of interaction, may I know the commands? stcurve is invalid after strata().
Thank you.

Best
Josh

estout test results

I am trying to obtain the test results using estout command however I am unable to put them there.


Code:
reg y x1 x2 x3 x4
eststo 1

xtset ıd time

xtreg y x1 x2 x3 x4,be

eststo 2

xtreg y x1 x2 x3 x4,fe

eststo 3

xtreg y x1 x2 x3 x4

eststo 4

hausman 3 4

hausman 3 4, sigmamore

estout 1 2 3 4, cells(b(star fmt(%9.4f)) se(par) t(par) p(par)) stats(N r2 r2_b r2_w r2_o , fmt(%9.4f %9.0g)labels (N))
However I want to see the poolability test result(It gives automatically with xtreg) and also the hausman test result at the bottom of the table where r2 N etc exists. Can someone please help me on that?

Loop to fill missing values

Hello,

I am using a panel data set that has a unique firm id called 'firm_id'. However, this firm_id has some missing values. These missing values are all unique firms, so among the missing values no firm occurs twice. I want to fill those missing values by continuing to count from the current highest available firm_id.
(max_firm_id = the max of firm_id)

I wrote this loop:

global i = max_firm_id + 1

foreach x of varlist firm_id {
replace `x' = $i if (`x' == .)
global i = $i + 1
}

However, when I run this, all the missing values are replaced by the value of max_firm_id + 1. What I want is that only the first missing value gets the value of max_firm_id + 1, the second missing value is one higher, the third missing value is again one higher, and so on.

test "dose-response" with categorical variable

Hi! I am performing cox regressions with a categorical indep. variable. In the survey, we have a likert-scale with 7 responses from "never" "once" ... to "daily". We coded 1) never 2) once 3) reoccurring. I want to know if (3) reoccurring exposure has a stronger association with the outcome than exposure (2) "once". My supervisor suggested treating the indep. variable as continuous (put it in stata without the i.) and claims that getting a stat. sign. p-value would prove that there is a trend. I find this hard to believe, it has been questioned by a reviewer and I can't find this described anywhere. I suggested instead to compare the two coefficients with a Wald test for stat. sign. of "not being the same". I would be greatefull for a) an explanation that supports my supervisors suggestion (possibly literature I can reference) b) support for the Wald test or c) an alternative to test this. Just to say it in advance: Including the variable with finer categories is off the table due to low power.

Tuesday, March 28, 2023

Using nlcom after postestimation margins

Dear forum,

Thanks in advance for any guidance you may give. I would like to estimate the ratio (95%CI) of two (2) predicted probabilities that were computed using postestimation margins after logistic regression. I thought I could use nlcom but I`m coming stuck.

Code:
logit dm i.sex 
margins sex, post
PHP Code:

Logistic regression                                     Number of obs 
=  4,830
                                                        LR chi2
(1)    =   7.04
                                                        Prob 
chi2   0.0080
Log likelihood 
= -2596.4629                             Pseudo R2     0.0014

------------------------------------------------------------------------------
          
dm Coefficient  Stderr.      z    P>|z|     [95confinterval]
-------------+----------------------------------------------------------------
         
sex |
       
Male  |  -.1912397   .0725017    -2.64   0.008    -.3333405    -.049139
       _cons 
|   -1.14623   .0420583   -27.25   0.000    -1.228663   -1.063797
------------------------------------------------------------------------------ 
.
Code:
margins sex, post
PHP Code:

Expression
Pr(dm), predict()

------------------------------------------------------------------------------
             |            
Delta-method
             
|     Margin   stderr.      t    P>|t|     [95confinterval]
-------------+----------------------------------------------------------------
         
sex |
     
Female  |      0.146      0.012    12.62   0.000        0.124       0.169
       Male  
|      0.117      0.014     8.17   0.000        0.089       0.146
------------------------------------------------------------------------------ 

Code:
logit dm i.lclass

PHP Code:
Logistic regression                                     Number of obs =  3,338
                                                        LR chi2
(4)    =  89.35
                                                        Prob 
chi2   0.0000
Log likelihood 
= -1431.0752                             Pseudo R2     0.0303

------------------------------------------------------------------------------
          
dm Coefficient  Stderr.      z    P>|z|     [95confinterval]
-------------+----------------------------------------------------------------
      
lclass |
          
2  |   .1144973   .1324727     0.86   0.387    -.1451444     .374139
          3  
|   1.012298    .181102     5.59   0.000     .6573448    1.367252
          4  
|  -.6119494   .1525491    -4.01   0.000    -.9109402   -.3129587
          5  
|   .3836895   .1778637     2.16   0.031     .0350831    .7322959
             
|
       
_cons |  -1.662886   .1079938   -15.40   0.000     -1.87455   -1.451222
------------------------------------------------------------------------------ 
Code:
. margins lclass, post
PHP Code:

Expression
Pr(dm), predict()

------------------------------------------------------------------------------
             |            
Delta-method
             
|     Margin   stderr.      t    P>|t|     [95confinterval]
-------------+----------------------------------------------------------------
      
lclass |
          
1  |      0.078      0.023     3.47   0.001        0.034       0.123
          2  
|      0.044      0.010     4.46   0.000        0.025       0.064
          3  
|      0.213      0.052     4.08   0.000        0.111       0.316
          4  
|      0.051      0.039     1.30   0.196       -0.026       0.128
          5  
|      0.296      0.104     2.84   0.005        0.091       0.501
------------------------------------------------------------------------------ 
Code:
 nlcom _b[Male.sex]\_b[1.lclass]
PHP Code:
[Male.sex]\_b[1.lclassnot found 


Thanks,
Itai

Generating tables using the eststo and esttab commands in Stata and exporting the tex. file into Latex

Hi, I am currently trying to generate a table in stata to import into Latex using the following code:

esttab using OLS_Tables.tex, replace keep(Leverage logTotalAssets Tangibility RevenueGrowth FirmAge _cons) order(Leverage logTotalAssets Tangibility RevenueGrowth FirmAge _cons) r2 b(%9.3f) se(%9.3f)label star(* 0.10 ** 0.05 *** 0.01) nonotes addnotes("$^{\ast} \text{ } p<0.10,\text{ } ^{\ast \ast}p<0.05, \text{ } ^{\ast \ast}p<0.01$" "Robust standard errors in parentheses") compress

It generates the following table as seen below
Array

How would I edit my code further to replicate the format of the following table including a title, a row below the equation number to include the type of regression (OLS) above the dependent variable (ROA), a seperate section showing Year FE?
Array


Thanks!

Callaway and Sant'Anna Diff in Diff

Hi:

I am trying to estimate the effect of a policy using the dynamic DID model by Callaway and Sant'Anna using the CSDID package. I have individual-level cross-sectional data. The policy I am investigating affects individuals living in urban areas (I have an indicator for urban) who were born after the year 1990 (sample contains individuals born between 1975 and 2000). I have tried the following code:
Code:
csdid years_schooling, ivar(urban_rural) time(YOB) gvar(first_treat) method(ipw)
. where first_treat=1 if living in urban area and born after 1990. However, I have realized that this code would only work with proper panel data setup, with only one observation per ID per period. Does anybody know how I can modify the above code for individual level data with many observations per urban/rural id per year of birth?

Difference-in-Differences (DiD) Methods with Jeff Wooldridge -- livestream workshop

Hi everyone

Instats presents Difference-in-Differences (DiD) Methods with Panel Data by Jeff Wooldridge, running April 4-5 (starting 11am EST). This workshop covers recent developments in difference-in-differences (DiD) estimation for causal inference with longitudinal panel data using Stata, covering the foundational theory for DiD approaches along with flexible regression-based methods and staggered interventions. More recent methods based on rolling controls, long differencing, and more will also will be covered. You can participate in the live sessions or asynchronously through the Instats platform where all Zoom sessions will be recorded and available for 30 days. Jeff will also be available for Q&A during the workshop and afterwards for 30 days through the Instats platform.

Please consider coming along, and feel free to tell your friends and colleagues about this unique opportunity to get taught DiD methods by one of the world's best econometric educators.

The workshop fees are ~$400 for researchers and $250 for PhD and other students, and for our institutional partners these come down to ~$275 and $185 USD. It's free to become an Instats partner if you'd like to receive similar discounts on any Instats workshop -- to inquire contact info@instats.org

Happy modeling!


Michael Zyphur
Director
Institute for Statistical and Data Science
instats.org

Drawing the yearly production of 16 region in one graph

Hello Stata users;

I have this data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 year int(year2012 year2013 year2014 year2015 year2016 year2017 year2018 year2019 year2020)
"NABEUL"           2500  2500  2500  2500  2500  2500  2500  2200  2200
"DAR CHAABANE"     2500  2500  2500  2500  2500  2500  2800  2200  2200
"BENI KHIAR"       2250  2250  2250  2250  2240  2240  2500  2500  2500
"KORBA"            8730  8730  8730  7730  7200  7200  7500  7500  7500
"MENZEL TEMIME"    9400  9400  9400  9400 14900 15500 15650 16500 19000
"EL MIDA"         14900 13900 13900 14900  8200  8500  8650  9200  9200
"KELIBIA"          9500  9500  9500  9500  9200 10000 10000 10000 10000
"HAMMAM GUEZAZE"   6500  6500  6500  6500  5500  6000  6500  6500  6500
"EL HAOUARIA"      9550  9550  9550  9550  8590  8590  8590  9500 12000
"TAKELSA"          3800  3800  3800  3800  3800  3800  3900  3900  3900
"SOLIMAN"          5050  5050  5050  5050  4050  4850  4850  4850  4850
"MENZEL BOUZELFA"  6950  6950  6950  6950  6950  6950  6950  6950  6950
"BENI KHALLED"     4600  4600  4600  4600  4600  4600  4800  4800  4800
"GROMBALIA"        5950  5950  5950  5950  5950  5950  5950  5000  5000
"BOU ARGOUB"       2280  2280  2280  2280  2280  2280  2320  2000  2000
"HAMMAMET"         1540  1540  1540  1540  1540  1540  1540  1400  1400
end
What I wanna do with this data is draw a graph that shows, for each region (NABEUL, DAAR CHAABEN, ...), the evolution of the yearly production of oil.
I thought about having a 16 line graph, but then I guess the graph would be crowded, but at the same time, I would love to have all 16 regions in one graph, just for comparaison reasons, I also thought about drawing a bar graph, but I don't think that's the best solution.

Any ideas and code suggestions please?

Thanks!

Long data formatting and Clogit model for discrete choice experiments with multiple choice questions

I have survey data from a discrete choice experiment where each survey respondent was asked to complete 8 discrete choice questions (choose one) with each choice question having 4 unlabeled alternatives, where one alternative was a "pick none" option. There were 3 different survey versions and the 8 choice questions differed on each version, meaning there are 24 total choice situations across the 3 surveys.

I know choice data requires long formatting, but I want to make sure I am setting it up to correctly incorporate that each respondent completed 8 choice questions and that there were 3 different sets of choice questions a respondent might have seen.

This is an example of how my data is currently formatted. I have the survey version denoted, but label choice sets as 1-8 on all survey versions and alternatives 1-4 within all choice sets. Should the choice sets and alternatives be numbered differently? For example, choice sets numbered 1-24 (1-8 on survey 1, 9-16 on survey 2, and 17-24 on survey 3)? What about for alternatives... is 1-4 okay, or do they need to be numbered differently to indicate the alternatives they are choosing between differs between choice questions?
SurveyVersion ID ChoiceSet Alternative Choice Price Outages
1 1 1 1 0 20 0
1 1 1 2 1 50 1
1 1 1 3 0 75 0
1 1 1 4 0 0 0
After correctly formatting the data, how can I make sure the clogit model is correctly interpreting this as one respondent making 8 different choices? If I group by ID it says there are multiple positive outcomes within a group since there are 8 ones in the "choice" column.



Monday, March 27, 2023

Exporting psmatch2 results on the matched sample: a solution with esttab

I am writing my thesis right now and have had many troubles trying to export the results of the user-written command psmatch2. Eststo and esttab generally only report the results from the unmatched sample instead of the average treatment effect on the treated (ATT) and the associated standard error. I have seen at least 5 different forum posts, all unsolved, with this issue. After much experimentation, I have found a solution.

Let's examine the problem. Run the psmatch2 command with eststo, and then run "return list" to see the scalars stored as a result of psmatch2:

Code:
eststo: psmatch2 dependent_var independent_var1 independent_var2, out(outcome_var) 
return list
You will see that the ATT is stored in r(att), and the standard error of the ATT is stored in r(seatt).

However, eststo does not store r(att) and r(seatt) automatically as an active estimation result. Therefore, running esttab after simply gives you the results from the unmatched sample.

The solution here is to use estadd to store r(att) and r(seatt), and then run esttab. Estadd will store r(att) and r(seatt) as estimation results, which esttab can then access.

Here is my solution, in which I also calculate the t statistic and two-tailed p-value. Obviously, substitute the dependent, independent, and outcome variables for your own variables.

Code:
eststo clear
eststo: psmatch2 dependent_var independent_var1 independent_var2, out(outcome_var) 
estadd scalar r(att)
estadd scalar r(seatt)
estadd scalar t_stat = (r(att)/r(seatt))
estadd scalar p_value = (2 * ttail(e(df_r), abs(e(t_stat))))
esttab, scalars(att seatt t_stat p_value) drop(_treated _cons) obslast
Under the esttab options, I drop _treated and _cons because these are the results on the unmatched sample. I use obslast so that the number of observations will appear at the bottom of the table instead of the top. Of course, you can modify the code to specify if you want to export the results to HTML, pdf, or whatever other format is possible with esttab.

This is still an incomplete solution as I cannot find a way to automatically star the p-values, but I believe it is helpful nonetheless.

Creating filename as columns in multiple files dta files before appending

Hi,

I am trying to include the name of the file as a variable before appending. Please see code below, which works well.

ideally the variable should be created after importing the delimited file, but before saving as a dta file for all files to be appended

Thanks


clear
cd "C:\Users\aaaa"

local files: dir "C:\Users\aaaa" files "*.txt"
foreach file in `files' {
clear
import delimited `file'
save `file'.dta, replace
}



local files: dir "C:\Users\aaaa " files "*.dta"

foreach file in `files' {
append using `file'
}

converting "None"(python) to "" (Stata).

Hi ,

manipulating a list in python I ended up with the following tuple, which I need to send to stata default frame

('47313362', None) where None means missing in python parlance.

Code:
. clear all
. gen a=""
. gen b=""
. set obs 1
Number of observations (_N) was 0, now 1.

. python
----------------------------------------------- python (type end to exit) ----------------------------------------------------------------------------------------------------------------------------------------
>>> from sfi import Data
>>> Data.store(['a','b'],None,('47313362', None))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Stata17\ado\base\py\sfi.py", line 1432, in store
    return _stp._st_storedata(vars, obss, val, svar)
TypeError: failed to store the string value in the current Stata dataset
r(7102);
>>> end
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

. list

     +--------------+
     |        a   b |
     |--------------|
  1. | 47313362     |
     +--------------+
So, Sfi library is not correctly converting "None"(python) to "" (Stata).

Does anyone know how to handle it?

thanks

Interpretation of a Regression Output

Dear experts,

I would like to kindly ask for your help in the interpretation of a regression output. Compare statistical significants and the economic importance of my various regressors--Is my key explanatory variable important? What are your most important variables? Fully and carefully use words to interpret at least one parameter estimate (using B, marginal effect, or elasticity; elasticity is easiest as long as it’s a continuous variable).
here is my regression:

use "C:\Users\hartm\Downloads\cosub_place_panel_pr oper ty2_9018 (6).dta
> "

. regress incherfindahl pctbachelors pctnevermarr pct18to64 unemprate

Source | SS df MS Number of obs = 65,143
-------------+---------------------------------- F(4, 65138) = 811.53
Model | 23.8730515 4 5.96826289 Prob > F = 0.0000
Residual | 479.046789 65,138 .007354337 R-squared = 0.0475
-------------+---------------------------------- Adj R-squared = 0.0474
Total | 502.91984 65,142 .007720362 Root MSE = .08576

------------------------------------------------------------------------------
incherfind~l | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
pctbachelors | -.1945729 .0048227 -40.34 0.000 -.2040255 -.1851204
pctnevermarr | -.003355 .005335 -0.63 0.529 -.0138116 .0071016
pct18to64 | .1157761 .0071806 16.12 0.000 .1017021 .1298501
unemprate | -.3474631 .008519 -40.79 0.000 -.3641603 -.330766
_cons | .0709203 .0042465 16.70 0.000 .0625972 .0792435
------------------------------------------------------------------------------

Sunday, March 26, 2023

Count unique appearance of values across multiple variables in wide format

Hello!

It has been a while since I have used Stata and I need a bit of help.

I have a wide dataset with variables activ1-active27. Across these variables, the values I am interested in are 3001-3008 (these values represent specific categories of activities a participant has engaged in).

I would like to count:
1) the unique appearance of 3001-3008 across activ1-active27 for each idpriv
2) the total count of any value within the range 3001-3008 across activ1-activ27 for each idpriv

I have tried a number of loops without success. While I would love to try -egenmore-, I cannot add packages because I am working on a disconnected server.

Generating IDs

Dear all,

Suppose I have the following dataset

Code:
clear
input var1 var2
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
3 1
3 2
3 3
3 4
4 1
4 2
4 3
4 4
end
I'd like to create a list of IDs based on var1 and var2. In particular, I'd like to get the same ID when the observations in var1 and var2 are swapped. For example, var1=3 and var2=4 should have the same ID as where var1=4 and var2=3.

I have tried
Code:
 egen id = group(var1 var2)
but it doesn't not give me exactly what I'm looking for.

Could anyone please help me with this?

Thank you very much for your help,

Vinh

Adjacency matrix

大家好:
如何使用统计码构建中国各省的(0,1 )空间邻接矩阵

Bysort and egen code not carrying over correct data

Hello,

I have a long data set containing college enrollment data by term, multiple instances per student ID. If a student enrolled 10 terms in college, there will be 10 date rows per ID. The date data variables include high school graduate date, enrollment begin (college) enrollment (end) college and graduation date (college). Each date is in a separate column, my issue is occurring specifically in graduation_date.

The graduated variable is marked N or Y if a student graduated college. I created a variable (degree) to note N = 0 and Y = 1. Then ran code

Code:
by randomid, sort: egen college_grad = max(degree)
to create college_grad and carry over that 1 to each observation by student ID.

I want to now carry over graduation_date by each observation where college_grad is 1. I did this with code

Code:
bys randomid: egen degree_date=max(graduation_date)
graduation date carries over to degree_date variable, but not in all instances does graduation_date match degree_date. Graduation date will be June 08, 2020 and under degree_date it will be noted as May 20, 2022. It happens in multiple instances. I noticed the first one in row 221, which is more than dataex shows. Can someone help troubleshoot my coding?


I also formatted the dates using

Code:
 foreach var of varlist enrollment_begin enrollment_end graduation_date high_school_grad_date degree_date {
replace `var' = date(string(`var' ,"%08.0f"),"YMD")
format %td `var'
}

Essentially what I am trying to do is convert the different dates to terms in order to run length of time between each date, over time


Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double randomid long(high_school_grad_date enrollment_begin enrollment_end) str1 graduated long graduation_date float(degree college_grad degree_date)
2103300231 22804 22886 22997 "N"     . 0 0     .
2103300237 22804     .     . "N"     . 0 0     .
2103300243 22804     .     . "N"     . 0 0     .
2103300255 22069 22886 22997 "N"     . 0 0     .
2103300255 22069 22312 22408 "N"     . 0 0     .
2103300255 22069 22522 22633 "N"     . 0 0     .
2103300255 22069 22432 22497 "N"     . 0 0     .
2103300255 22069 22432 22505 "N"     . 0 0     .
2103300255 22069 22662 22773 "N"     . 0 0     .
2103300255 22069 22158 22262 "N"     . 0 0     .
2103300264 22069 22508 22631 "N"     . 0 0     .
2103300264 22069 22144 22267 "N"     . 0 0     .
2103300264 22069 22872 22995 "N"     . 0 0     .
2103300264 22069 22872 22996 "N"     . 0 0     .
2103300264 22069 22663 22789 "N"     . 0 0     .
2103300264 22069 22872 22995 "N"     . 0 0     .
2103300264 22069 22872 22996 "N"     . 0 0     .
2103300267 21705 22817 22836 "N"     . 0 0     .
2103300267 21705 22067 22074 "N"     . 0 0     .
2103300267 21705 22515 22630 "N"     . 0 0     .
2103300267 21705 21780 21903 "N"     . 0 0     .
2103300267 21705 22879 22996 "N"     . 0 0     .
2103300267 21705 22144 22210 "N"     . 0 0     .
2103300267 21705 21927 22056 "N"     . 0 0     .
2103300267 21705 22690 22806 "N"     . 0 0     .
2103300282 21341     .     . "N"     . 0 0     .
2103300291 20977     .     . "N"     . 0 0     .
2103300321 22804     .     . "N"     . 0 0     .
2103300336 21340 22676 22792 "N"     . 0 0     .
2103300336 21340 22883 22995 "N"     . 0 0     .
2103300426 22440     .     . "N"     . 0 0     .
2103300477 21341 21409 21532 "N"     . 0 0     .
2103300477 21341 21927 22056 "N"     . 0 0     .
2103300477 21341 21563 21692 "N"     . 0 0     .
2103300477 21341 22872 22995 "N"     . 0 0     .
2103300477 21341 22299 22425 "N"     . 0 0     .
2103300477 21341 22144 22267 "N"     . 0 0     .
2103300477 21341 21780 21903 "N"     . 0 0     .
2103300477 21341 22879 22995 "N"     . 0 0     .
2103300477 21341 22872 22996 "N"     . 0 0     .
2103300477 21341 22515 22631 "N"     . 0 0     .
2103300477 21341 22797 22873 "N"     . 0 0     .
2103300477 21341 22067 22141 "N"     . 0 0     .
2103300477 21341 22669 22785 "N"     . 0 0     .
2103300477 21341 22872 22996 "N"     . 0 0     .
2103300477 21341 22872 22995 "N"     . 0 0     .
2103300495 20977 21409 21431 "N"     . 0 0     .
2103300495 20977 21200 21328 "N"     . 0 0     .
2103300552 21705 22515 22626 "N"     . 0 0     .
2103300552 21705 21927 22056 "N"     . 0 0     .
2103300552 21705 22879 22990 "N"     . 0 0     .
2103300552 21705 22319 22437 "N"     . 0 0     .
2103300552 21705 21780 21903 "N"     . 0 0     .
2103300552 21705 22151 22262 "N"     . 0 0     .
2103300552 21705 22683 22801 "N"     . 0 0     .
2103300552 21705 22144 22267 "N"     . 0 0     .
2103300552 21705 22067 22141 "N"     . 0 0     .
2103300552 21705 22879 22994 "N"     . 0 0     .
2103300561 22069 22144 22267 "N"     . 0 0     .
2103300561 22069 22879 22995 "N"     . 0 0     .
2103300561 22069 22648 22778 "N"     . 0 0     .
2103300561 22069 22284 22407 "N"     . 0 0     .
2103300561 22069 22781 22868 "N"     . 0 0     .
2103300561 22069 22515 22631 "N"     . 0 0     .
2103300585 22440 22522 22632 "N"     . 0 0     .
2103300585 22440 22669 22784 "N"     . 0 0     .
2103300585 22440 22796 22861 "N"     . 0 0     .
2103300585 22440 22796 22869 "N"     . 0 0     .
2103300591 22440     .     . "N"     . 0 0     .
2103300594 22804     .     . "N"     . 0 0     .
2103300600 22804     .     . "N"     . 0 0     .
2103300603 22656     .     . "N"     . 0 0     .
2103300855 22804 22900 22991 "N"     . 0 0     .
2103300861 22440 22872 22996 "N"     . 0 0     .
2103300861 22440 22872 22995 "N"     . 0 0     .
2103300861 22440 22796 22861 "N"     . 0 0     .
2103300861 22440 22872 22995 "N"     . 0 0     .
2103300861 22440 22796 22869 "N"     . 0 0     .
2103300861 22440 22508 22631 "N"     . 0 0     .
2103300861 22440 22663 22789 "N"     . 0 0     .
2103300861 22440 22872 22996 "N"     . 0 0     .
3333004638 20248     .     . "N"     . 0 0     .
3333004845 20248 20472 20600 "N"     . 0 0     .
3333004845 20248 20318 20440 "N"     . 0 0     .
3333004845 20248 22524 22704 "N"     . 0 0     .
3333004845 20248 22705 22888 "N"     . 0 0     .
3333004845 20248 22889 23069 "N"     . 0 0     .
3333005328 20248 21409 21456 "N"     . 0 0     .
3333005328 20248 20682 20804 "N"     . 0 0     .
3333005340 20248 22515 22631 "N"     . 0 1 22078
3333005340 20248 21818 21889 "N"     . 0 1 22078
3333005340 20248     .     . "Y" 22078 1 1 22078
3333005340 20248 21556 21623 "N"     . 0 1 22078
3333005340 20248 21818 21896 "N"     . 0 1 22078
3333005340 20248 22305 22420 "N"     . 0 1 22078
3333005340 20248 22151 22267 "N"     . 0 1 22078
3333005340 20248 22879 22995 "N"     . 0 1 22078
3333005340 20248     .     . "Y" 22078 1 1 22078
3333005340 20248 21916 21994 "N"     . 0 1 22078
3333005340 20248 21999 22078 "N"     . 0 1 22078
end
format %td high_school_grad_date
format %td enrollment_begin
format %td enrollment_end
format %td graduation_date
format %td degree_date

Thank you.

Panel Var and Fan Charts

Hi all,

Its bit urgent please

I'm using panel var in my project, i need little help

could anyone please share command to generate stochastic simulation after panel var?

And most importantly how to generate Fan charts in stata?

i have three variable in my model.

xtvar y1 y2 y3 , mc lags(2)



I look forward to kind suggestions

Thanks

Fix a coefficient to the value it had in previous mlogit

Hi everyone,

this may be a very simple problem and I am just not getting it, but here's what I am trying to do. I want to use -mlogit- on a 4-category dependent variable. The plan is to estimate the model, save the coefficients, fix the value of some of the b's to the value obtained in the previous regression, then run a new model. My problem is I cannot seem to find any syntax that will fix the value. I tried the following:

Method 1:
mlogit depvar age c.age#c.age . . . ;

constraint 1 [#2]age = [#2]_b[age] ;
mlogit depvar age c.age##c.age . . . , constraints(1) ;
The constraint is ignored (because the second mlogit has standard errors for the coefficient, instead of just saying the value and then saying (constrained).

Method 2
mlogit depvar age c.age##c.age . . . ;

gen c01 = [#2]_b[age]] ;
constraint 1 [#2]_cons = c01;
mlogit depvar age c.age##c.age . . ., constraints(1) ;
This produces error r(111).

Method 3 (I knew this would not work, but it was nice confirmation):
mlogit depvar age c.age#c.age . . . ;

constraint 1 [#2]age = .0296174 ;
mlogit depvar age c.age##c.age . . . , constraints(1) ;
This constrains the coefficient, but other coefficients change their values, which just shows I don't have "all" (or enough of) the digits.

I think I could display "enough" of the digits and copy them one-by-one into the code, but 1)I am likely to make mistakes; 2)I want to do this for too many variables in a model for it to be feasible, 3)I want to do this for too many models for it to be feasible, and 4)I find it very hard to believe stata has no way for me to use some elements in _b to constrain coefficients in the next regression. I just can't seem to find my way to the write (no pun intended) syntax. I've looked at the manual entries for a)mlogit, b)constraints, c)matrix operations, and a few others. I googled various terms, and I searched the forum as well (google turned up some listserv era discussions, but I could find nothing on my question). It may be a very simple thing I am just missing. But I'm nowhere near converging to a solution. I appreciate any help anyone can offer.

Thanks!
Sam

Saturday, March 25, 2023

How to change the format of numbers stored in a local macro?

Dear Statalist,

I am trying to format the number stored in a local macro using the string(n, s) function. Here is my code:
Code:
version 17.0
quietly: count // Store total no. of pts to scalar

scalar SC_total = r(N)

local ML_total = string(SC_total, "%9.0fc")

display `ML_total'

3 274
The expected output of display command is 3,274; however, it appeared without coma, like this 3 274.

Please help me solve this issue.


Thank you
Abdullah

Unexpected error with -bayes:mixed-

The following
Code:
sysuse bplong
bayes: mixed bp i.when || patient: , noconstant residuals(exchangeable)
aborts with the following error message:

random effects level patient is empty
an error occurred when bayes executed mixed

r(198);


Does anyone see what I'm doing wrong?

(The code is excerpted; the complete do-file and SMCL log file are attached below.)

The helpfile, help bayes_mixed, specifically mentions allowing the noconstant option in the random effects equation as does the corresponding user's manual entry (Page 593), although the latter doesn't contain a worked example that exercises that option.

I'd rather not trouble Technical Support with this if someone can readily spot an error.

R-squared and -didregress-

Good evening all,

Is there a way to have -didregress- show a value for r-squared- as you would see if you did a difference-in-difference regression using the old methods (just using -reg-)?

Thanks!

Mixed-Effect Zero Inflated Negative Binomial model ??

Hello,
I have a study where I am evaluating the relationship between the total number of health care visits and geographic access (measured by travel time categories).
Initially I thought this would be a mixed-effects Poisson regression - total visits + individual level fixed effects & random intercept at geographic area level -- and using robust SE.

However, I have a significant amount of no-visits (25.7%) for my dependent variable.

Therefore I'm looking at a mixed effect zero-inflated negative binomial model. I am wondering if this is logical or if others have recommendations.

Thanks,
Devin

Why does the asreg command give me values for the first observations of my sample even thought I specified a min?

Hello everyone!

I am using the asreg command to calculate betas using one year of daily data, but I want the year to have at least 150 observations. (I am using CRSP daily data)
To that end I applied the following :

gen day = day(date)
bys PERMNO : asreg exc_ret MktRF, window( day 252 ) min(151)

It works, however, I get values even for the very first observations of my sample.
Does anyone have any idea why that happens?

Thank you for taking the time to read my question.

P.S: In case it is relevant, here is a preview of my data (before I added the MktRF and before I created the exc_ret) using dataex.

. dataex

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double PERMNO long date double(SHRCD EXCHCD) str8 CUSIP double(PRC RET SHROUT)
10001 12057 11 3 "36720410"    14.5    .0357142873108387 1080
10001 12058 11 3 "36720410"  -14.25 -.017241379246115685 1080
10001 12059 11 3 "36720410"  -14.25                    0 1080
10001 12060 11 3 "36720410"    14.5  .017543859779834747 1080
10001 12061 11 3 "36720410"    14.5                    0 1080
10001 12064 11 3 "36720410"    14.5                    0 1080
10001 12065 11 3 "36720410"  -14.25 -.017241379246115685 1080
10001 12066 11 3 "36720410"   13.75 -.035087719559669495 1080
10001 12067 11 3 "36720410"   13.75                    0 1080
10001 12068 11 3 "36720410"     -14    .0181818176060915 1080
10001 12071 11 3 "36720410"     -14                    0 1080
10001 12072 11 3 "36720410"   13.75  -.01785714365541935 1080
10001 12073 11 3 "36720410"   14.25     .036363635212183 1080
10001 12074 11 3 "36720410"   14.25                    0 1080
10001 12075 11 3 "36720410"   13.75 -.035087719559669495 1080
10001 12078 11 3 "36720410"     -14    .0181818176060915 1080
10001 12079 11 3 "36720410"     -14                    0 1080
10001 12080 11 3 "36720410"   13.75  -.01785714365541935 1080
10001 12081 11 3 "36720410"     -14    .0181818176060915 1080
10001 12082 11 3 "36720410"     -14                    0 1080
10001 12085 11 3 "36720410"   13.75  -.01785714365541935 1080
10001 12086 11 3 "36720410"   13.75                    0 1080
10001 12087 11 3 "36720410"     -14    .0181818176060915 1080
10001 12088 11 3 "36720410"     -14                    0 1080
10001 12089 11 3 "36720410"     -14                    0 1080
10001 12092 11 3 "36720410"   13.75  -.01785714365541935 1080
10001 12093 11 3 "36720410"   14.25     .036363635212183 1080
10001 12094 11 3 "36720410"   13.75 -.035087719559669495 1080
10001 12095 11 3 "36720410"     -14    .0181818176060915 1080
10001 12096 11 3 "36720410"   14.25   .01785714365541935 1080
10001 12100 11 3 "36720410"      14 -.017543859779834747 1080
10001 12101 11 3 "36720410"     -14                    0 1080
10001 12102 11 3 "36720410"   14.25   .01785714365541935 1080
10001 12103 11 3 "36720410"     -14 -.017543859779834747 1080
10001 12106 11 3 "36720410"     -14                    0 1080
10001 12107 11 3 "36720410"     -14                    0 1080
10001 12108 11 3 "36720410"   14.25   .01785714365541935 1080
10001 12109 11 3 "36720410"     -14 -.017543859779834747 1080
10001 12110 11 3 "36720410"   14.25   .01785714365541935 1080
10001 12113 11 3 "36720410"    14.5  .017543859779834747 1080
10001 12114 11 3 "36720410" -14.125   -.0258620698004961 1080
10001 12115 11 3 "36720410" -14.125                    0 1080
10001 12116 11 3 "36720410" -14.125                    0 1080
10001 12117 11 3 "36720410" -14.125                    0 1080
10001 12120 11 3 "36720410"   13.75 -.015398230403661728 1080
10001 12121 11 3 "36720410"   13.75                    0 1080
10001 12122 11 3 "36720410"   14.25     .036363635212183 1080
10001 12123 11 3 "36720410"      14 -.017543859779834747 1080
10001 12124 11 3 "36720410"   14.25   .01785714365541935 1080
10001 12127 11 3 "36720410"  14.125 -.008771929889917374 1080
10001 12128 11 3 "36720410"   14.25  .008849557489156723 1080
10001 12129 11 3 "36720410" -14.125 -.008771929889917374 1080
10001 12130 11 3 "36720410"   13.75  -.02654867246747017 1080
10001 12131 11 3 "36720410" -14.125  .027272727340459824 1080
10001 12134 11 3 "36720410"    14.5   .02654867246747017 1080
10001 12135 11 3 "36720410"   13.75   -.0517241396009922 1080
10001 12136 11 3 "36720410"   13.75                    0 1080
10001 12137 11 3 "36720410"   13.75                    0 1080
10001 12138 11 3 "36720410" -14.125  .027272727340459824 1080
10001 12141 11 3 "36720410"   13.75  -.02654867246747017 1080
10001 12142 11 3 "36720410" -14.125  .027272727340459824 1080
10001 12143 11 3 "36720410"   14.25  .008849557489156723 1075
10001 12144 11 3 "36720410" -14.125 -.008771929889917374 1075
10001 12145 11 3 "36720410" -14.125                    0 1075
10001 12148 11 3 "36720410"    14.5   .02654867246747017 1075
10001 12149 11 3 "36720410" -14.125   -.0258620698004961 1075
10001 12150 11 3 "36720410" -14.125                    0 1075
10001 12151 11 3 "36720410"   13.75  -.02654867246747017 1075
10001 12155 11 3 "36720410" -14.125  .027272727340459824 1075
10001 12156 11 3 "36720410"    14.5   .02654867246747017 1075
10001 12157 11 3 "36720410"    14.5                    0 1075
10001 12158 11 3 "36720410"    14.5                    0 1075
10001 12159 11 3 "36720410"   14.75  .017241379246115685 1075
10001 12162 11 3 "36720410"   14.75                    0 1075
10001 12163 11 3 "36720410"   14.75                    0 1075
10001 12164 11 3 "36720410"   14.75                    0 1075
10001 12165 11 3 "36720410"    14.5 -.016949152573943138 1075
10001 12166 11 3 "36720410"   14.75  .017241379246115685 1075
10001 12169 11 3 "36720410"   14.75                    0 1075
10001 12170 11 3 "36720410"   14.75                    0 1075
10001 12171 11 3 "36720410"  -15.25  .033898305147886276 1075
10001 12172 11 3 "36720410"   15.75  .032786883413791656 1075
10001 12173 11 3 "36720410"  -15.25   -.0317460335791111 1075
10001 12176 11 3 "36720410"  -15.25                    0 1075
10001 12177 11 3 "36720410"    15.5  .016393441706895828 1075
10001 12178 11 3 "36720410"   15.75  .016129031777381897 1075
10001 12179 11 3 "36720410"  -15.25   -.0317460335791111 1075
10001 12180 11 3 "36720410"   15.75  .032786883413791656 1075
10001 12183 11 3 "36720410"  -15.25   -.0317460335791111 1075
10001 12184 11 3 "36720410"  -15.25                    0 1075
10001 12185 11 3 "36720410"    15.5  .016393441706895828 1075
10001 12186 11 3 "36720410"    15.5                    0 1075
10001 12187 11 3 "36720410"   14.75  -.04838709533214569 1075
10001 12190 11 3 "36720410"      15  .016949152573943138 1075
10001 12191 11 3 "36720410"    15.5   .03333333507180214 1075
10001 12192 11 3 "36720410"   15.25 -.016129031777381897 1075
10001 12193 11 3 "36720410"    15.5  .016393441706895828 1075
10001 12194 11 3 "36720410"    15.5                    0 1075
10001 12197 11 3 "36720410" -15.125 -.024193547666072845 1075
10001 12198 11 3 "36720410" -15.125                    0 1075
end
format %td date

Appending excel files - Error r(601)

Hi. I am trying to append all excel files in a single folder into one big file, but the code below throws the r(601) error. I've made sure that the files have no leading or trailing spaces, so am not sure what is happening. Any help would be great. Thanks.

Code:
local filenames: dir "C:\Users\katja\Dropbox\Raw Data"  files "*.xlsx*"

foreach f of local filenames {
import excel using `"`f'"'
gen source = `"`f'"'
display `"Appending `f'"'
append using `building'
save `"`building'"', replace
}

export excel using 1_large_excel_file.xlsx, replace

Counts in a long dataset taking into account other variables including time

Hello,

I have a very large dataset that includes people ("personid") who could have been seen at 4 multiple sites (distinguished by "siteid"). The variable "monthyear" is the month and the year that the visit occurred. I want to determine:

1) the counts of people who were seen at more than one site and what the combination of these sites were (for example, in the below dataex example of 4 sites, how many people were seen at sites 19 and 32, 19 and 24, 19 and 47, 32 and 47, etc.)
2) how many people were seen at site 24 AFTER being seen at site 19 (essentially the same table as in my first question but incorporating time)
3) how many people were seen specifically at site 24 AFTER site 19 within 3 months

Any advice much appreciated!

Sarah


Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long personid byte siteid str7 monthyear
   2 19 "2021_03"
   2 19 "2020_07"
   3 19 "2019_11"
  51 19 "2020_03"
  51 19 "2019_01"
  51 19 "2019_07"
  51 19 "2019_01"
  51 19 "2020_12"
  52 32 "21-Jan" 
  52 19 "2020_03"
  52 32 "21-Feb" 
  52 19 "2020_01"
  52 19 "2020_03"
  53 19 "2021_09"
  53 19 "2020_02"
  82 19 "2019_10"
  82 19 "2021_04"
  82 19 "2020_02"
  83 19 "2022_02"
  83 19 "2019_07"
  83 19 "2022_02"
  83 19 "2021_02"
  83 19 "2022_02"
  84 19 "2019_01"
  84 47 "19-Oct" 
  84 19 "2019_04"
  84 19 "2019_01"
  84 19 "2019_05"
  84 19 "2020_08"
  84 19 "2019_01"
 145 32 "21-Apr" 
 145 32 "21-Mar" 
 145 32 "21-Feb" 
 214 32 "21-Jan" 
 217 47 "20-Jan" 
 246 47 "20-Sep" 
 257 32 "21-Jan" 
 300 47 "20-May" 
 306 47 "18-Apr" 
 335 47 "18-Dec" 
 347 32 "21-Sep" 
 347 32 "21-Feb" 
 375 47 "18-Oct" 
 379 32 "21-Jan" 
 379 32 "20-Nov" 
 399 47 "19-Feb" 
 432 32 "20-Oct" 
 432 32 "21-Mar" 
 432 32 "21-Jan" 
 432 32 "21-Apr" 
 432 32 "20-Sep" 
 432 32 "21-Jan" 
 432 32 "20-Mar" 
 432 32 "20-Dec" 
 432 32 "22-Jan" 
 432 32 "21-Jan" 
 432 32 "20-Dec" 
 451 47 "18-Mar" 
 451 47 "18-Mar" 
 451 47 "20-Oct" 
 451 47 "19-Jul" 
 451 47 "18-Jul" 
 451 47 "18-Mar" 
 452 32 "19-Oct" 
 452 32 "20-Dec" 
 457 47 "19-Mar" 
 463 47 "18-Jun" 
 539 47 "19-Feb" 
 545 47 "20-Jan" 
 545 47 "20-Sep" 
 545 47 "20-Jan" 
 552 47 "20-Mar" 
 570 47 "18-Jan" 
 570 47 "18-Mar" 
 570 47 "18-Mar" 
8775 47 "20-Jan" 
8779 24 "17-Feb" 
8779 24 "17-Sep" 
8781 47 "19-Aug" 
8781 47 "18-Oct" 
8795 47 "20-Oct" 
8795 47 "18-Oct" 
8798 32 "21-Nov" 
8806 47 "22-Feb" 
8806 47 "21-Aug" 
8806 47 "21-Oct" 
8806 47 "21-Sep" 
8806 47 "21-Mar" 
8814 47 "18-Nov" 
8815 32 "19-May" 
8815 32 "19-Nov" 
8815 32 "19-Dec" 
8815 32 "20-Mar" 
8815 32 "19-Nov" 
8815 32 "19-Nov" 
8815 32 "20-Mar" 
8828 47 "19-Sep" 
8828 47 "19-Sep" 
8838 47 "18-Oct" 
8838 47 "18-Oct" 
end

Friday, March 24, 2023

Perturb with Mlogit

Hi All,

I estimate a multinomial logit model with 10 independent variables. I suspect that there might be a multicollinearity issue. So, I am using perturb command. However, I always get an error message. If I use perturb command for logit or linear models, I do not get any errors. I have added a sample code that demonstrates what I am doing. I would appreciate any help.

clear all
set obs 90
gen y = 0 if _n<=30
replace y = 1 if _n>30 & _n<=60
replace y = 2 if _n>60

gen x1 = rnormal(0,1)
gen x2 = rnormal(5,2)
gen x3 = rnormal(10,3)


perturb: mlogit y x1 x2 x3, poptions(pvars(x1 x2) prange(1 1))



variable x1 not found
r(111);

end of do-file

Endogeneity concerns

I am working with a dataset of about a total of 28000 observations for over a period of 10 years. I ran a probit model for my baseline results. Both my dependent and independent variables are binary. Now i suspect that there might be a reverse causality issue in my model. So I did a IV probit in Stata. But result turned out to be insignificant (showing there is no endogeneity concerns). Then I realized I should not be using IV probit if both the variables are binary. Now I need to solve my reverse causality issue using some other method. I need help with stata commands.

Control variables for intraday data.

Hi everyone,
I would like to add control variables (BookToMarket, Size...) to my data to regress Abnormal return on text sentiment. For each firm and specific date, I have several thousands of row of data.
Each row is a unique sentence and variables associated with it.
I have included a sample of my data.
If I add the size(log of market cap), BooktoMarket ...., then I would be adding the same values for thousands of rows which raises some question marks on my mind. If my data was based on days rather than intraday data, I would have included the values for those variables.
I can also tag each of those values according to quintiles (1...5) and then use it in my regression as categorical variable.
In the sample data, I categorized variables (ln_mktcap bm roa) according to quintiles.

Should I just use the raw data rather than quintiles in my regression? What would be the best way to add those variables to my regression?

reghdfe AbnRet i.Sentiment i.ln_mktcap i.bm i.roa, absorb(year SIC) cluster(month_year)

Thank you.





Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte Sentiment double AbnRet float(year month_year) byte(SIC ln_mktcap bm roa)
0  .000042760626015603265 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0  -.00008552490912983046 2017 685 38 5 3 5
0    .0000427551413056948 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0   -.0010592237732415244 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1  -.00004275514130580582 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1  -.00004275331338177146 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1 -.000042751485614056506 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0   .00004274965800266095 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
1  .000042751485614056506 2017 685 38 5 3 5
1  .000042753313381882485 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1    .0000427551413056948 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0  -.00012828187804658775 2017 685 38 5 3 5
0    .0001282654239170844 2017 685 38 5 3 5
0    .0021446675765255385 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0  .000042760626015603265 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1   -.0001282873636947457 2017 685 38 5 3 5
0    .0000855139387719861 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
1  -.00008552125203098448 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
1    .0000855139387719861 2017 685 38 5 3 5
1  1.1102230246251565e-16 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0   .00008552125203120653 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
0                       0 2017 685 38 5 3 5
2                       0 2017 685 38 5 3 5
1   -.0000427642832705466 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
1                       0 2017 685 38 5 3 5
0   .00004276245456480421 2017 685 38 5 3 5
end
format %tm month_year
label values Sentiment sentimentlabel
label def sentimentlabel 0 "Neu", modify
label def sentimentlabel 1 "Pos", modify
label def sentimentlabel 2 "Neg", modify

Different colors for graph bars

Hello,

I have a basic doubt. I am trying to plot a graph on GDP per capita which has 2 bars- one for OECD and another for non OECD countries. I want to have two different colors for each bar group. But, the following command gives me the same color bar for OECD and non OECD countries.

graph bar (mean) wb_gdppercap, over(OECD,relabel(1 "Non OECD" 2 "OECD")) title("GDP per Capita")ytitle("GDP per capita") bar(1, color(maroon) fintensity(inten80)) bar(2, color(navy) fintensity(inten60))

Can someone please explain what I am doing wrong here? I have inserted the graph here for reference.

Thank you very much!


Compute a difference rate between two variables

Hello everyone,

I have to use the variables lonely and ca_couple, calculate the proportions of individuals who feel lonely some of the time or often separately by whether they are living with a partner or not.

Then, I need to calculate the difference in the rate of "loneliness" between those two groups.
  • lonely takes 1 if the individual feels lonely sometimes or often. And 0 if the individual hardly ever or never feels lonely.
  • ca_couple takes 1 if living in couple, and 0 otherwise.
I tried the follwowing code:
Code:
tabulate lonely ca_couple, row col
local p_lonely_couple = 100*r(row1col1)/r(row1col2)
local p_lonely_single = 100*r(row2col1)/r(row2col2)
local diff_lonely = `p_lonely_single' - `p_lonely_couple'
display "Difference in the rate of loneliness: " %5.2f `diff_lonely'
Is a better way to do that? Thank you in advance. I obtain that in stata:

Code:
. local p_lonely_couple = 100*r(row1col1)/r(row1col2)

. local p_lonely_single = 100*r(row2col1)/r(row2col2)

. local diff_lonely = `p_lonely_single' - `p_lonely_couple'

. display "Difference in the rate of loneliness: " %5.2f `diff_lonely'
Difference in the rate of loneliness:     .
Best,

Michael

Descriptive Statistics at Baseline

Hi,
Suppose I have two waves of a survey. I make use of a DID.


The treatment is whether a particular HH_member passed away by wave 2.
I restricted the sample to those respondents living with that hh member in wave 1.
The treatment dummy takes the value 0 if HH-member is alive and 1 if the HH member died.
Post takes value 0 if the observation is in wave 1 and takes value 1 if in wave 2.

The regression I run to get DID output is
Code:
reg outcome i.HH_member_died##i.post $controls $FE , cl(var)

I try to look at the descriptive statistics of two groups at the baseline to see if any statistical difference exists at the baseline between them.
1) respondents for whom the hh_member was alive in wave1 but passed away by wave2
2) respondents for whom the hh_member was alive in wave 1 and remained alive even in wave 2.

I run the following command
Code:
keep post==0
ttest outcome, by(HH_memeber_died)
But this shows the difference between groups for whom the HH_member was died and alive in wave 1. Not the difference between these two groups in wave 1 for whom the HH_member passed away in wave2 and the other did not.

How can I get that?


Code:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double BMI_survey float(HH_member_died post)
25.4386425018 0 0
16.9948692322 0 0
30.0425415039 0 0
29.6986141205 1 1
26.7642993927 1 1
20.6010570526 0 1
25.5052757263 1 1
23.8579883576 0 1
22.4769897461 0 0
21.7111110687 0 0
20.8466835022 0 0
25.7420845032 0 0
17.8782272339 0 0
25.3662319183 1 1
22.0370044708 0 1
19.8885383606 0 0
21.9642086029 0 1
29.1248645783 1 1
            . 1 1
20.7804164887 1 1
 19.079946518 0 0
29.4948539734 0 0
 21.731502533 0 0
24.1133422852 0 0
14.3230552673 1 1
23.4682006836 0 1
 21.426651001 1 1
19.5180568695 0 1
  22.80884552 1 1
27.1408443451 1 1
33.8613395691 0 0
 21.102804184 0 0
21.5114250183 0 0
21.3299102783 1 1
 23.688167572 1 0
31.5484447479 1 1
26.3711357117 1 1
28.5294895172 0 0
22.6051673889 1 1
22.8827972412 1 0
18.0570392609 1 0
  21.54296875 0 1
17.9458408356 0 1
29.5692863465 0 1
 20.625705719 1 1
22.8537845612 1 0
20.0530490875 1 1
25.8380050659 1 1
24.6478939057 0 1
21.3943805695 0 0
28.6160259247 0 0
25.1688022614 0 0
17.9838981629 0 0
26.0739994049 1 1
21.6328411102 1 1
21.1121730805 1 1
26.4681358338 0 1
 24.112241745 0 1
24.8491153717 0 0
20.7447032929 0 0
21.5888004303 1 0
25.0364208222 0 0
24.9698143005 0 1
24.6674957276 0 1
18.8573570252 1 1
17.0916538239 1 1
18.9992809296 1 1
13.7701473236 1 1
30.8436298371 0 0
23.0513725281 1 0
24.9562129974 1 1
23.4150829315 1 1
23.6802539826 1 1
39.5971870423 0 0
24.2142925263 1 0
28.9832057953 1 1
    25.234375 1 0
34.2484893799 1 1
21.8441505432 0 1
34.4006996155 0 0
21.1394233704 0 0
32.6331367493 0 0
26.7796020508 1 1
25.7466068268 1 1
21.3556995392 0 1
29.2932777405 1 1
23.8346385956 0 0
30.2977848053 0 0
24.3925933838 0 1
30.3410663605 1 0
22.2304401398 1 0
20.5079307556 0 0
23.6710567474 1 0
24.0117797852 1 0
20.9045200348 1 1
 26.642944336 0 1
21.9777774811 0 0
17.3138198853 0 0
25.1688022614 1 0
23.3634643555 0 0
end
------------------ copy up to and including the previous line ------------------

Thursday, March 23, 2023

guidance on minimum sample size for 'medsem' package

My understanding was that SEM typically requires a larger sample size, ideally over 200 observations for reliable results, but the 'medsem' package could be used on smaller sample sizes? (please correct me if I am wrong). Now that I've finished my analyses with this assumption ( 82 observations), I cant seem to find any official sources which mentions appropriate sample size for 'medsem'. Would anyone be able to direct me? Thank you!

Nesting foreach loops within a while loop

I am trying to create a foreach loop within a while loop, and the issue is that the foreach loop executes completely, even if the condition to break the while loop has been met using one of the variables. I'm including source code to explain what is happening.

I want the while loop to stop after the `loop_e' > `total_e' the first time it happens (when the foreach loop loops over trunk). However, the while loop only stops executing after all the variables in the foreach loop have been looped over.

Code:
sysuse auto, clear
local total_e = 1e-2 
local loop_e = 1 

while `loop_e' > `total_e' {
    foreach var in trunk weight length turn {
        di "`var'"
        local loop_e = 1e-6 
    }
}