BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Monday, January 31, 2022

Bootstrapping in parallel

Using Stata, is there any convenient way to bootstrap in parallel? Although Stata/MP does some things in parallel, it seems it still bootstraps in parallel. In 2018, jerome falken reported some trouble using the -parallel bs- command by George Vega and Brian Quistorff, which I am also having trouble with. Maarten Buis offered a solution, but the solution pertained to simulation rather than bootstrapping.

Has there been any progress since 2018? Bootstrapping is an embarrassingly parallel task, and it is becoming a little embarrassing if it can't be conveniently parallelized in Stata.

(Here's the 2018 discussion of this issue: https://www.statalist.org/forums/for...l-bootstraping

Syntax Question

Dear all,

I have the following data merged (industry level data and trade at the industry level at the same level of disaggregation). I want to build an index (which has in the literature for a while). And I want to make sure I am using the proper syntax.I posted the index from latex document file.

The data is the following

Code:

 * Example generated by -dataex-. To install: ssc install dataex
clear
input int(country year isic isiccomb) byte sourcecode long(Establishments Employment) double(Wages OutputINDSTAT4 ValueAdded GrossFixed) float(NewImportsWorld NewExportsWorld NewOutput)
752 1990 1511 1511 1  211 18970  410760814  4812639070  750829139         .  205787008  152038000  4812639232
231 1990 1511 1511 0    .     .          .           .          .         .          .          .           .
554 1990 1511 1511 1  250 27670          .           .          .         .   39597000 1904916992           .
 40 1990 1511 1511 1  211 14341  281952246  2648385387  483309907  82419767  173028000  267799008  2648385024
352 1990 1511 1511 0    .     .          .           .          .         .     643000   13263000           .
428 1990 1511 1511 1  273  7176          .     4237801          .         .          .          .     4238000
158 1990 1511 1511 0    .     .          .           .          .         .  454000992  706115008           .
250 1990 1511 1511 0    .     .          .           .          .         . 4090940928 3523758080           .
 44 1990 1511 1511 0    .     .          .           .          .         .          .          .           .
442 1990 1511 1511 0    .     .          .           .          .         .          .          .           .
124 1990 1511 1511 1  633 46651 1127459988 10664004340 2239077148         .  818744000 1248240000 10664003584
332 1990 1511 1511 0    .     .          .           .          .         .          .      40000           .
410 1990 1511 1511 1  242 14578  113690412  1294216411  411153903  80461830 1689671936   70361000  1294215936
776 1991 1511 1511 0    .     .          .           .          .         .          .          .           .
 44 1991 1511 1511 0    .     .          .           .          .         .          .          .           .
352 1991 1511 1511 0    .     .          .           .          .         .     840000    6164000           .
158 1991 1511 1511 0    .     .          .           .          .         .  525063008 1046316032           .
702 1991 1511 1511 1   17   584    5668143    29826054   15814303   5500854  237994000          .    29826000
703 1991 1511 1511 1   44 12891   20219157   497778004          .         .          .          .   497777984
246 1991 1511 1511 0    .     .          .           .          .         .   30361000   70059000           .
554 1991 1511 1511 1  250 27670          .           .          .         .   30491000 1920531968           .
372 1991 1511 1511 1  144 12028  199582956  3536149122  581041597  77257141  176370000 1552030976  3536148992
410 1991 1511 1511 1  268 13471  118579948  1592683837  522258484 122523477 1717752960   56953000  1592684032
250 1991 1511 1511 0    .     .          .           .          .         . 3985878016 3657634048           .
332 1991 1511 1511 0    .     .          .           .          .         .          .          0           .
578 1991 1511 1511 1  218 10305  294456237  3430694979  385781581  55221834   28105000   32967000  3430694912
752 1991 1511 1511 1  203 18243  401262876  4286257370  733530253         .  223212000   94554000  4286256896
 40 1991 1511 1511 1  197 14498  303395178  2877601494  518726611  90747131  148488992  214368992  2877601024
428 1991 1511 1511 1  308  6998          .    13002620          .         .          .          .    13003000
231 1991 1511    . 1    7  3663    3963285    41919807   18216908    576812          .          .    41920000
442 1991 1511 1511 0    .     .          .           .          .         .          .          .           .
276 1991 1511 1511 0    .     .          .           .          .         . 5809546752 3660185088           .
124 1991 1511 1511 1  608 44994 1139846700 10260858181 2315482061         .  883571968 1078050048 10260857856
710 1991 1511 1511 1  282 25743  151775151   973095427  326538624         .          .          .   973094976
703 1992 1511 1511 1   44 11804   20555460   442950701          .         .          .          .   442951008
380 1992 1511 1511 1  487 40073 1598169003 14880936757 2621584081 429854678 5494322176 1119657984 14880936960
776 1992 1511 1511 0    .     .          .           .          .         .          .          .           .
578 1992 1511 1511 1  156  9938  310083625  3518866599  429479374  44734082   33248000   44994000  3518866944
410 1992 1511 1511 1  301 16275  159169753  1807466207  669953810 140485343 1658217984   85798000  1807465984
590 1992 1511 1511 1   29  1844    7130000    94010000    7002000   5813000          .          .    94010000
792 1992 1511 1511 1  102 11338   89348079  1091967404  256402794  27211874  235286000   57520000  1091966976
352 1992 1511 1511 0    .     .          .           .          .         .    1083000    6734000           .
246 1992 1511 1511 0    .     .          .           .          .         .   39316000   76060000           .
702 1992 1511 1511 1   18   689    7281302    43691493   19426422  10394317  241943008          .    43691000
250 1992 1511 1511 0    .     .          .           .          .         . 4353823232 4323734016           .
616 1992 1511 1511 0    .     .          .           .          .         .  173932992  251923008           .
372 1992 1511 1511 1  152 12912  236507239  3893002632  598923688  64656443  195820000 1894761984  3893003008
417 1992 1511 1511 1  126  4690    2184146    53954268          .         .          .          .    53954000
 40 1992 1511 1511 1  197 14296  342939911  3218480128  588420953 101987715  163900992  234464000  3218480128
762 1992 1511 1511 1   38  1911          .     4038072          .    222762          .          .     4038000
554 1992 1511 1511 1  253 27510          .           .          .         .   36465000 1972987008           .
428 1992 1511 1511 1  164  9916          .    72852289          .         .          .          .    72852000
442 1992 1511 1511 0    .     .          .           .          .         .          .          .           .
124 1992 1511 1511 1  588 46848 1150764898  9753177527 2360765039         .  865473984 1210505984  9753178112
332 1992 1511 1511 0    .     .          .           .          .         .          .          .           .
 44 1992 1511 1511 0    .     .          .           .          .         .          .          .           .
208 1992 1511 1511 1    .     .          .  6926130024 1673933460 191667248  363102016 4078067968  6926130176
232 1992 1511 1511 1    2   665     407491     2568297    1205004         0          .          .     2568000
158 1992 1511 1511 0    .     .          .           .          .         .  507337984 1074740992           .
752 1992 1511 1511 1  206 16860  437292072  4456000026  841885404         .  294567008   96473000   4.456e+09
276 1992 1511 1511 0    .     .          .           .          .         . 7119265792 3176626944           .
440 1992 1511 1511 1    . 13840    6932730   162210090          .         .    1064000   59887000   162210000
231 1992 1511    . 1    7  3221    2732917     9003390    3226405    101338          .          .     9003000
496 1992 1511 1511 1    6  2402    2416744    55967442   17935814  16579535          .          .    55967000
710 1992 1511 1511 1    .     .          .  1155665786          .         .  110301000  166346000  1155666048
 36 1993 1511 1511 1  645 46391          .  6255440104 1832635187         .   46923000 3161800960  6255439872
762 1993 1511 1511 1   99  1660          .    10427730          .   2533552          .          .    10428000
300 1993 1511 1511 0    .     .          .           .          .         .  797523008   52119000           .
702 1993 1511 1511 1   19   764    8769080    57363855   22139002   2611724  235638000   47758000    57364000
554 1993 1511 1511 1  268 28710          .           .          .         .   48869000 2014814976           .
578 1993 1511 1511 1  155 10057  283206578  3052464792  327553531  55020567   46533000   36922000  3052464896
332 1993 1511 1511 0    .     .          .           .          .         .          .          .           .
590 1993 1511 1511 1   29  1853   10127000    90596000   18053000   7148000          .          .    90596000
703 1993 1511 1511 1   49 11578   22164747   356065938   54859374  21066227          .          .   356065984
 40 1993 1511 1511 1  192 14408  332735622  2959263196  560602833 106450231  145410000  212484992  2959262976
442 1993 1511 1511 0    .     .          .           .          .         .          .          .           .
208 1993 1511 1511 1  342     .          .  5733148322 1647420341 161574160  349519008 3530907904  5733148160
158 1993 1511 1511 0    .     .          .           .          .         .  495086016 1113870976           .
428 1993 1511 1511 1  224  8107    7083167   113632597   28508877   2393336          .          .   113633000
380 1993 1511 1511 1  519 40676 1320163744 12029462504 2126914231 361138745 4497477120  912947968 12029462528
792 1993 1511 1511 1  102 11624   92944925  1231588530  303413746  15475649  289539008   50670000  1231588992
232 1993 1511 1511 1    2   379     337303      918390     444808         0          .          .      918000
231 1993 1511    . 1    7  3152    2121600     7378200    3182200     78400          .          .     7378000
440 1993 1511 1511 1   19 12286    7873732   177247834          .         .          .          .   177248000
276 1993 1511 1511 0    .     .          .           .          .         . 5450384896 2676812032           .
352 1993 1511 1511 0    .     .          .           .          .         .     590000    5244000           .
398 1993 1511 1511 1 1304     .          .   4.096e+08          .         .          .          .   4.096e+08
 32 1993 1511 1511 1 1079 45728  554246175  4612161218  914363739 153194467  150990000  791281024  4612161024
616 1993 1511 1511 0    .     .          .           .          .         .  225496000  189412000           .
724 1993 1511 1511 1 3079 58797  933848554 10418448822 1860357115 313765688  958025024  447728992 10418449408
496 1993 1511 1511 1    5  2242     717940    17164811    5290320   2496860          .          .    17165000
116 1993 1511 1511 0    .     .          .           .          .         .          .          .           .
826 1993 1511 1511 1 1579 79875 1900242517 14566026303 3259058398 451438830 3064681984 1592034048 14566026240
776 1993 1511 1511 0    .     .          .           .          .         .          .          .           .
246 1993 1511 1511 0    .     .          .           .          .         .   32308000   77148000           .
124 1993 1511 1511 1  574 47010 1062780989  9715377556 2127536261         .  942958976 1343789056  9715378176
752 1993 1511 1511 1  202 16453  333976666  3409937551  600378214         .  201392000  111792000  3409937920
417 1993 1511 1511 1  158  4169     709576    22061175          .         .          .          .    22061000
372 1993 1511 1511 1  153 13041  212624531  3533411479  561092834  75305054  124006000 1791170048  3533411072
512 1993 1511 1511 1    1     1       1560       14044       8270         .   70760000          .       14000
end

------------------ copy up to and including the previous line ------------------

In order to calculate the first term of the index (Skit) = VAkit/GDPit, I am using the following syntax (k industry, i country, t year). VAkit would be Value added per industry k in country i in year t

sort country isic year

by country isic year: egen TotalOutput_sector= sum(OutputINDSTAT4)

sort isic year country

by isic year country: egen TotalValueAdded=sum(ValueAdded)

gen tradability_one= TotalValueAdded/OutputINDSTAT4

In order to calculate the second term of the index (Dkt)= Xkt/WGDPkt, I am using the following syntax (k is industry, t year). Xkt would be total exports by industry (isic) on a particular year. For WGDPkt (World Total Output), I am using total output of industry k in year t

sort isic year

by isic year: egen TotalExports= sum(NewExportsWorld) if NewExportsWorld!=. //(To avoid having shares greater than 1)

sort isic year

by isic year: egen TotalOutput= sum(NewOutput) if NewOutput!=. //(To avoid having shares greater than 1)

gen tradability_output= Totalexports/TotalOutput_sector

Finally, to calculate the final index, I am doing

gen share= tradability_output*tradability_one

The reason of my question is the fact that I am getting extremely strange values in my sample

Thank you so much,

Repeated time values within panel when trying to create new variable

Hi, I am trying to create a variable called "fempstat" which measures an individual's employment status in the next month. I have the following lines of code:
xtset cpsidp date
gen fempstat=f1.empstat
label var fempstat "Next month employment status"

However, I am getting the error "repeated time values within panel". I have tried to switch the variable "date" out with "month" but I am still getting the same error.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte month double cpsid byte(statefip empstat) float date
2021 1 20191000055400 1 10 732
2021 1 20201100108800 1 10 732
2021 1 20191000000800 1 36 732
2021 1 20201100070800 1 36 732
2021 1 20201000048900 1 21 732
2021 1 20201100075800 1 34 732
2021 1 20191000006900 1  0 732
2021 1 20201200015600 1 36 732
2021 1 20201200115200 1 10 732
2021 1 20210100040500 1 10 732
2021 1 20191000107600 1 32 732
2021 1 20201100138400 1 10 732
2021 1 20191000064600 1 10 732
2021 1 20201100070800 1 36 732
2021 1 20191200098300 1 32 732
2021 1 20201200057900 1 10 732
2021 1 20191200132900 1 34 732
2021 1 20201000063600 1 10 732
2021 1 20201200039300 1 10 732
2021 1 20201100033300 1 34 732
2021 1 20191200076200 1 36 732
2021 1 20191000062900 1 10 732
2021 1 20201100060300 1 10 732
2021 1 20201000122200 1 36 732
2021 1 20201000023600 1 10 732
2021 1 20210100072600 1 36 732
2021 1 20191100037400 1  0 732
2021 1 20191200085700 1 36 732
2021 1 20200100122500 1 36 732
2021 1 20201200122800 1 10 732
2021 1 20191100108900 1  0 732
2021 1 20201100005300 1 10 732
2021 1 20201200068000 1 36 732
2021 1 20191200030500 1 10 732
2021 1 20191100144700 1 21 732
2021 1 20191000127000 1 10 732
2021 1 20201000057500 1 10 732
2021 1 20200100102700 1  0 732
2021 1 20201100025400 1 10 732
2021 1 20201100056500 1 10 732
2021 1 20200100070100 1 10 732
2021 1 20191200117500 1 12 732
2021 1 20191100126000 1 34 732
2021 1 20201000010700 1 10 732
2021 1 20191200094600 1 36 732
2021 1 20201000000200 1 10 732
2021 1 20201100000100 1 36 732
2021 1 20201100064000 1 10 732
2021 1 20191000126700 1 36 732
2021 1 20201200008400 1 10 732
2021 1 20210100014400 1 10 732
2021 1 20201100069100 1 10 732
2021 1 20201200123000 1 10 732
2021 1 20191000133700 1 10 732
2021 1 20201100108500 1 36 732
2021 1 20201200135300 1 10 732
2021 1 20191200075100 1 10 732
2021 1 20210100009800 1 34 732
2021 1 20210100115200 1 12 732
2021 1 20191100082900 1 10 732
2021 1 20201000137500 1 10 732
2021 1 20191000083500 1 10 732
2021 1 20191100028100 1 10 732
2021 1 20210100044200 1 10 732
2021 1 20201200124900 1 36 732
2021 1 20201100033000 1 10 732
2021 1 20191100004600 1 10 732
2021 1 20201100079500 1  0 732
2021 1 20201000133500 1 10 732
2021 1 20201200039400 1 10 732
2021 1 20210100023300 1 36 732
2021 1 20210100011700 1 36 732
2021 1 20201200057300 1 36 732
2021 1 20201100109500 1  0 732
2021 1 20200300000800 1  0 732
2021 1 20201100139800 1 34 732
2021 1 20191100060300 1  0 732
2021 1 20200100147800 1 32 732
2021 1 20191100123800 1  0 732
2021 1 20201100082100 1 10 732
2021 1 20201000033800 1  0 732
2021 1 20191200075100 1 34 732
2021 1 20201200006800 1 36 732
2021 1 20201200016600 1 32 732
2021 1 20201100112900 1 10 732
2021 1 20210100119400 1 10 732
2021 1 20201000085300 1 34 732
2021 1 20210100111900 1  0 732
2021 1 20201000077700 1  0 732
2021 1 20200100145800 1 36 732
2021 1 20200100057300 1 36 732
2021 1 20201200087500 1 10 732
2021 1 20201000099500 1 32 732
2021 1 20200100108700 1 10 732
2021 1 20201200140700 1 36 732
2021 1 20191000064600 1 21 732
2021 1 20191200044200 1 10 732
2021 1 20201200057300 1 36 732
2021 1 20210100042900 1 12 732
2021 1 20191200106300 1 36 732
end
format %tm date
label values month month_lbl
label def month_lbl 1 "January", modify
label values statefip statefip_lbl
label def statefip_lbl 1 "Alabama", modify
label values empstat empstat_lbl
label def empstat_lbl 0 "NIU", modify
label def empstat_lbl 10 "At work", modify
label def empstat_lbl 12 "Has job, not at work last week", modify
label def empstat_lbl 21 "Unemployed, experienced worker", modify
label def empstat_lbl 32 "NILF, unable to work", modify
label def empstat_lbl 34 "NILF, other", modify
label def empstat_lbl 36 "NILF, retired", modify

Date format using month and year

Hi, I am trying to make a new variable called "date" using the month and year in my dataset. The variable "month" is a byte and the variable "year" is an int. I would like to make it into the format <year>m<month> and call this variable date. For example, month 01 (january) and year 2022 would become 2022m01. This variable should be a float. I'm unsure of how to convert these two variables into a single float variable with that format. Thanks.

Generalized Leontief variable cost function

Dear All,

I am having difficulty to estimate in Stata a generalized Leontief variable cost function with 3 variable inputs and 4 quasi-fixed inputs. I want to use the command 'sureg' function to estimate the cost function in Morrison Catherine J. (1997) Structural change, caiptal investment and productivity in the Food processing inudstry. American Journal of Agricultural Economics 79, 110-125.

I am writing to kindly seek your help and ask whether anyone of you have estimate this system of 7 equations in Stata. I would appreciate your help and advice.

Best regards,
Alphonse

Plotting heterogenous (interaction) effect in event study plot

I have the following dataset,

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id monthly independent sales TreatZero lead2 lead3 lead4 lead5 lead6 lead7_backwards lag1 lag2 lag3 lag4 lag5 lag6 lead1)
1 672 0  249512 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 673 0  177712 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 674 0  109524 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 675 0   20776 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 676 0  846471 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 677 0  328806 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 678 0   46470 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 679 0  394758 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 680 0  301179 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1 681 0  756129 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 682 0  116117 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 683 0  374293 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 684 0  432423 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 685 0  364780 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 686 0  797174 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 687 0  400569 0 0 0 0 0 0 0 1 0 0 0 0 0 0
1 688 0  126897 0 0 0 0 0 0 0 0 1 0 0 0 0 0
2 672 1   65104 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 673 1   77133 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 674 1   76200 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 675 1  218342 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 676 1   39265 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 677 1    6649 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 678 1   41677 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 679 1  156277 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 680 1   98535 0 0 0 0 0 1 0 0 0 0 0 0 0 0
2 681 1    3920 0 0 0 0 1 0 0 0 0 0 0 0 0 0
2 682 1  165573 0 0 0 1 0 0 0 0 0 0 0 0 0 0
2 683 1   73413 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2 684 1   97216 0 1 0 0 0 0 0 0 0 0 0 0 0 0
2 685 1  106015 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 686 1   33066 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 687 1   54207 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 688 1  118173 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 672 0  737203 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 673 0  306725 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 674 0  198990 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 675 0 1054751 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 676 0 1886147 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 677 0 1142545 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 678 0 1277825 0 0 0 1 0 0 0 0 0 0 0 0 0 0
3 679 0  397706 0 0 1 0 0 0 0 0 0 0 0 0 0 0
3 680 0 1354199 0 1 0 0 0 0 0 0 0 0 0 0 0 0
3 681 0 1348788 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 682 0  914274 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 683 0  805134 0 0 0 0 0 0 0 1 0 0 0 0 0 0
3 684 0  769588 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 685 0  292174 0 0 0 0 0 0 0 0 0 1 0 0 0 0
3 686 0 1236297 0 0 0 0 0 0 0 0 0 0 1 0 0 0
3 687 0   58338 0 0 0 0 0 0 0 0 0 0 0 1 0 0
3 688 0 1681455 0 0 0 0 0 0 0 0 0 0 0 0 1 0
4 672 1   82611 0 0 0 0 0 0 1 0 0 0 0 0 0 0
4 673 1  190401 0 0 0 0 0 0 1 0 0 0 0 0 0 0
4 674 1  122867 0 0 0 0 0 0 1 0 0 0 0 0 0 0
4 675 1  111444 0 0 0 0 0 0 1 0 0 0 0 0 0 0
4 676 1   44781 0 0 0 0 0 1 0 0 0 0 0 0 0 0
4 677 1  158895 0 0 0 0 1 0 0 0 0 0 0 0 0 0
4 678 1   71693 0 0 0 1 0 0 0 0 0 0 0 0 0 0
4 679 1   62140 0 0 1 0 0 0 0 0 0 0 0 0 0 0
4 680 1  321720 0 1 0 0 0 0 0 0 0 0 0 0 0 0
4 681 1  188944 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 682 1  179921 1 0 0 0 0 0 0 0 0 0 0 0 0 0
4 683 1  159214 0 0 0 0 0 0 0 1 0 0 0 0 0 0
4 684 1  118173 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 685 1  246030 0 0 0 0 0 0 0 0 0 1 0 0 0 0
4 686 1   83191 0 0 0 0 0 0 0 0 0 0 1 0 0 0
4 687 1  100867 0 0 0 0 0 0 0 0 0 0 0 1 0 0
4 688 1   42409 0 0 0 0 0 0 0 0 0 0 0 0 1 0
5 672 0   32247 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 673 0    9993 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 674 0   44384 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 675 0   28284 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 676 0    6873 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 677 0   35780 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 678 0     226 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 679 0   41062 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 680 0   34161 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 681 0    5773 0 0 0 0 0 1 0 0 0 0 0 0 0 0
5 682 0   12586 0 0 0 0 1 0 0 0 0 0 0 0 0 0
5 683 0   22660 0 0 0 1 0 0 0 0 0 0 0 0 0 0
5 684 0   40637 0 0 1 0 0 0 0 0 0 0 0 0 0 0
5 685 0   40881 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 686 0    3560 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 687 0    9365 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 688 0     852 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 672 0   94715 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 673 0    2692 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 674 0  123457 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 675 0  724462 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 676 0  871857 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 677 0   16821 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 678 0  499244 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 679 0  441009 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 680 0  429921 0 0 0 0 0 0 1 0 0 0 0 0 0 0
6 681 0  156341 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6 682 0  461273 0 0 0 0 1 0 0 0 0 0 0 0 0 0
6 683 0  325237 0 0 0 1 0 0 0 0 0 0 0 0 0 0
6 684 0  302210 0 0 1 0 0 0 0 0 0 0 0 0 0 0
6 685 0  332281 0 1 0 0 0 0 0 0 0 0 0 0 0 0
6 686 0  298871 0 0 0 0 0 0 0 0 0 0 0 0 0 0
end
format %tm monthly

I have a staggered diff in diff setting. In the dataset, lead values correspond to the indicator variables for pre-treatment values, whereas lag values correspond to the indicator variables for post-treatment values. The dependent variable is sales. This is my code for the baseline plot:

Code:

xtset id monthly
xtreg sales lead7_backwards lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3  lag4 lag5 lag6 i.monthly, fe vce(cluster id)
coefplot, vertical  omitted keep(lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3  lag4 lag5 lag6)  ciopts(recast(rcap)) yline(0) msymbol(d)

This is the output:

Array
Now I would like to see the effect of treatment on independent titles, for this I have a variable called independent. I have two questions regarding this. In a usual staggered diff in diff regression, without time-varying estimates, my code would be the following:

Code:

xtreg sales i.treatment##i.independent, fe  vce(cluster id)

To show time varying effect, can I use:

Code:

xtreg sales i.(lead7_backwards lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3  lag4 lag5 lag6)##i.independent i.monthly, fe vce(cluster id)

If yes how can I make a similar plot to previous one but shows how the effect differs for the titles of independent artists (independent == 1) vs label artist (independent == 0). If no, what would be correct code to run the analysis and plot them? Better to run split sample analysis, if yes how can I combine plots from two different regressions?

Apologies for many questions...

nonest outside xtreg

The -xtreg- command has an undocumented -nonest- option which permits the use of cluster() when the cluster variable is not nested within the panel variable.

Do other xt commands offer a similar option? It appears that xtpoisson, for one, does not.

FRAPPLY: module to nondestructively apply command(s) to a frame

Dear Statalisters,

Thanks to Kit Baum, a new command frapply is now available in SSC.

Beginning with version 16, Stata can hold multiple data frames in memory. This changed how I work in Stata as I fully integrated frames into my workflow. However, using the stock frames commands felt somewhat tedious as I had to write multiple lines of command to do simple things like putting a summary of a subset into a new frame.

frapply simplifies this process. It arose from my need to write one-liners to iteratively extract what I want from the current dataset I am working with and look at the result without changing it. It applies a command or a series of commands to the dataset in the specified (or current) frame and optionally puts the result into another frame. Otherwise destructive commands (such as drop, keep, collapse, contract, etc.) can be daisy-chained somewhat similar to the pipe operator in R (and in Tidyverse), all the while preserving the dataset. This can be useful in interactive and experimental settings where we want to quickly and iteratively summarize and/or transform the dataset without changing it. It can also be a convenient drop-in replacement for the frames prefix and a substitute for frames commands such as frame copy and frame put. It can do what those commands can do--but is more flexible.

As an elementary example, let's say we want to load up the auto data, subset expensive cars, and put averages by trunk space into a different frame. And we want to try different thresholds for what "expensive" would entail, so we will repeatedly run this chunk of code.

Code:

frame change default
capture frame drop temp
frame put if price > 10000, into(temp)
frame change temp
collapse price, by(trunk)
list

Using frapply, this can be written more concisely as follows.

Code:

frapply default if price > 10000, into(temp, replace change): collapse price, by(trunk) || list

frapply takes the input frame, subsets it, applies daisy-chained commands, and puts the result in either a new or an existing frame (or even a temporary frame if into() option is omitted). We could rerun this line and get the same result regardless of the current frame.

I hope this command improves your workflow. Comments and suggestions are always welcome. Also, feel free to let me know if you find any errors.

Stata not directing to correct PDF section after -help topic-

Hello everyone,

Anyone having trouble in accessing the *.pdf manuals from Stata after the -help topic- command? For example. After my command -help putexcel-, when I click "(View complete PDF manual entry)" Stata takes me to the list of all pdf manuals Stata has in its repository instead opening the particular pdf I am after. I restarted both Stata and Adobe several times without any success. Is this happening to others or just in my system? I know I can access the PDF I am after from Google search but that is not the solution I am after.

Code:

Environment: macOS Monterey. Version: 12.1
Stata version: 17

Update status
    Last check for updates:  31 Jan 2022
    New update available:    none         (as of 31 Jan 2022)


Adobe version: Adobe Acrobat Reader DC. Continuous Release | version: 2021.011.20039

First Stage Regression - Huge F-statistic

Hello all,

I use a stacked first differences model to estimate the impact of globalization on unemployment in Western Europe.

• Panel data: 16 countries, yearly observations for the years 1995-2007
• 2SLS regression
• The following code shows the first-stage regression, where...

- I control for the same variables (c1, c2, c3, c4) and
- use time and country fixed effects
...just like in the reduced form / second-stage regression

Code:

xi: xtreg x z c1 c2 c3 c4 i.year, fe vce(cluster country)

The regression yields...

- estimate for the instrument z: 1.6788
- standard error of the instrument z: 0.005

Hence, the coefficient is highly significant.

To get the F-statistic, I use the following code:

Code:

test z

The F-statistic is 112125. This seems way too high? Or is it plausible to have such a high F-statistic?

If not, what could have gone wrong?

I appreciate your help!

Including breakpoint dates in SVAR command

Dear Stata users,

I would like to know how to specify the code for a SVAR model in a such way that includes breakpoint equations or time dummies that correspond to structural breaks.

It should ideally be the same way Eviews gives users the option to set breakpoint equations through "@during("date1 date2")" while implementing a SVAR model.

The code I had in mind would be something like that below. It generally follows the same formatting as the example in the help window, but I would like to include breakpoints in the model:

Code:

 svar dln_inv dln_inc dln_consump, aeq(A) beq(B)

Thank you.

Combining categories

I would like to combine two categories for one of my variables but I am not sure whether there is a test I could run to justify combining the categories. Here is an example of what I am trying to do:

Code:

. ta worry

 How worried are |
 you about being |
   infected with |
       COVID-19? |      Freq.     Percent        Cum.
-----------------+-----------------------------------
     Not at all  |        641       37.31       37.31
       A little  |        387       22.53       59.84
         Rather  |        203       11.82       71.65
           Very  |        487       28.35      100.00
-----------------+-----------------------------------
           Total |      1,718      100.00

. 
. xtreg WB i.worry [pw= panel_ind_wt_1_2], fe

Fixed-effects (within) regression               Number of obs     =      1,718
Group variable: Findid                          Number of groups  =        859

R-sq:                                           Obs per group:
     within  = 0.0191                                         min =          2
     between = 0.0388                                         avg =        2.0
     overall = 0.0214                                         max =          2

                                                F(3,858)          =       2.25
corr(u_i, Xb)  = 0.0312                         Prob > F          =     0.0811

                               (Std. Err. adjusted for 859 clusters in Findid)
------------------------------------------------------------------------------
             |               Robust
          WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       worry |
  A little   |   .5341085   .3449488     1.55   0.122    -.1429337    1.211151
    Rather   |   .0941577   .3733732     0.25   0.801    -.6386741    .8269896
      Very   |    .688718   .2973494     2.32   0.021     .1051006    1.272335
             |
       _cons |  -.4072548   .1753646    -2.32   0.020    -.7514485    -.063061
-------------+----------------------------------------------------------------
     sigma_u |  1.5148842
     sigma_e |  1.8692139
         rho |  .39643111   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
. recode worry 3=2 4=3
(worry: 690 changes made)

. 
. ta worry

 How worried are |
 you about being |
   infected with |
       COVID-19? |      Freq.     Percent        Cum.
-----------------+-----------------------------------
     Not at all  |        641       37.31       37.31
       A little  |        590       34.34       71.65
         Rather  |        487       28.35      100.00
-----------------+-----------------------------------
           Total |      1,718      100.00

. 
. xtreg WB i.worry [pw= panel_ind_wt_1_2], fe

Fixed-effects (within) regression               Number of obs     =      1,718
Group variable: Findid                          Number of groups  =        859

R-sq:                                           Obs per group:
     within  = 0.0146                                         min =          2
     between = 0.0451                                         avg =        2.0
     overall = 0.0238                                         max =          2

                                                F(2,858)          =       2.69
corr(u_i, Xb)  = 0.0556                         Prob > F          =     0.0684

                               (Std. Err. adjusted for 859 clusters in Findid)
------------------------------------------------------------------------------
             |               Robust
          WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       worry |
  A little   |   .3706983    .310278     1.19   0.233    -.2382945    .9796911
    Rather   |   .6793746   .2977687     2.28   0.023     .0949343    1.263815
             |
       _cons |  -.4024158   .1756929    -2.29   0.022    -.7472539   -.0575777
-------------+----------------------------------------------------------------
     sigma_u |  1.5122795
     sigma_e |  1.8724047
         rho |  .39479255   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

As categories 2/3 are not statistically significant, I would like to combine them. The results remain the same. As in is there a difference between categories 2 and 3? I am not sure if I am phrasing my question well..

Sunday, January 30, 2022

issues with global macro windows code on mac

I am trying to use a Master do file that someone has sent me from their windows computer. I have a Mac and the project that was sent to me was based on a global macro, used to change to and from directories.

Code:

gl MP "/Users/Cassie/Desktop/data/Jared_Data"

    cd "$MP/Replication_Files/Management/Master"

unable to change to /Users/Cassie/Desktop/data/Jared_Data/Replication_Files/Management/Ma
> ster
r(170);

end of do-file

The global MP is meant to be the master path for the project. To run the master management file, I have to change to the master management folder.

I suspect that this is because there's a slight software difference between windows and Mac, any ideas what the problem might be?

Error: cannot compute an improvement -- discontinuous region encountered

Good night,

I am trying to perform a probit ordered model with panel data and the following error appears: cannot compute an improvement -- discontinuous region encountered

Do anybody know how to solve this error and what does it means?

Thanks!
Laura

Saturday, January 29, 2022

Survival analysis - data management query

Dear all,

I have two variables in my dataset namely "date of birth" "date of death" and would like to do survival analysis. First of all, I am not a statistician so bear with me please.

1st variable codebook:

Code:

    Type: Numeric daily date (int)

                 Range: [13880,21182]                 Units: 1
       Or equivalently: [01jan1998,29dec2017]         Units: days
         Unique values: 1,062                     Missing .: 0/1,174

                  Mean: 17165.3 = 30dec2006(+ 7 hours)
             Std. dev.:    2083
           Percentiles:       10%        25%        50%        75%        90%
                            14408      15411    17030.5      18864      20241
                        13jun1999  12mar2002  17aug2006  25aug2011  02jun2015

2nd variable codebook:

Code:

 Type: Numeric daily date (int)

                 Range: [13894,21239]                 Units: 1
       Or equivalently: [15jan1998,24feb2018]         Units: days
         Unique values: 180                       Missing .: 994/1,174

                  Mean: 17588.6 = 26feb2008(+ 13 hours)
             Std. dev.: 2006.76
           Percentiles:       10%        25%        50%        75%        90%
                          14964.5    15987.5    17325.5      19290    20459.5
                        20dec2000  09oct2003  08jun2007  24oct2012  06jan2016

I would like to make basic graphs with 95% CI and regression analysis table later on but I do not know how to make it happen. Any tips to convert the dates to practical variables (e.g., age in months or years) would be much appreciated. Also, I know how to run a multivariate regression based on OR but with survival data I am not sure which command should I go for.

Thanks in advance

Stock volatility

Hi,

I need to calculate stock volatility using CRSP daily stock returns. According to the article, it is calculated using the square root of the sum of squared daily returns over the year. To adjust for differences in the number of trading days, the raw sum is multiplied by 252 and divided by the number of trading days. I have the variable "returns". Does anyone which code to use for this?

Another thing: the CRSP data was too big to download at once, so I need to merge. However, when I try to merge, it gives the error: factor-variable and time-series operators not allowed. What should I do?

Thanks in advance!

Regression with Interaction Terms and Post-Estimation Interpretation

Dear Statalist community,

I am running the following code to initially generate a regression output:

Code:

xtreg y c.a##c.b $controlvariables i.fyear, fe

This code generates the following with control variables and year output omitted for simplicity:

Code:

Fixed-effects (within) regression               Number of obs     =     11,363
Group variable: gvkey                           Number of groups  =      1,547

R-squared:                                      Obs per group:
     Within  = 0.4145                                         min =          1
     Between = 0.3950                                         avg =        7.3
     Overall = 0.4009                                         max =         12

                                                F(24,9792)        =     288.80
corr(u_i, Xb) = -0.4448                         Prob > F          =     0.0000

-------------------------------------------------------------------------------------
                  y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------------+----------------------------------------------------------------
                  a |  -.8820247    1.37278    -0.64   0.521    -3.572956    1.808907
                  b |  -.1120089    .110937    -1.01   0.313    -.3294683    .1054504
                    |
            c.a#c.b |   6.042615   2.489008     2.43   0.015     1.163646    10.92158
                    
  
              _cons |     13.749   1.191463    11.54   0.000     11.41348    16.08451
--------------------+----------------------------------------------------------------
            sigma_u |  4.8700768
            sigma_e |  2.9240485
                rho |  .73502736   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
F test that all u_i=0: F(1546, 9792) = 2.86                  Prob > F = 0.0000

Then, i run some post-estimation commands to generate an interaction plot.

margins, at(b=(0(1)1) a =(0 1))

Predictive margins Number of obs = 11,363
Model VCE: Conventional

Expression: Linear prediction, predict()
1._at: a = 0
b = 0
2._at: a = 0
b = 1
3._at: a = 1
b = 0
4._at: a = 1
b = 1

------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 9.605668 .028179 340.88 0.000 9.550438 9.660898
2 | 9.493659 .1150657 82.51 0.000 9.268134 9.719184
3 | 8.723643 1.379275 6.32 0.000 6.020313 11.42697
4 | 14.65425 2.704182 5.42 0.000 9.354149 19.95435
------------------------------------------------------------------------------

marginsplot

Above margins code generates the following marginsplot
Please see attached for graph

I have the following questions!

1. The regression output shows coefficient of 13.749. In the margins command output, I was expecting the 13.749 to match where a =0 and b = 0 where the main standalone variables are zero. However, this does not seem to be true also in the marginsplot as well. Would this be because i am including control variables as well as fixed effects?

2. Based on the regression output, the effect of a depends on the value(s) b but not by themselves. I am having a little bit of trouble understanding the economic magnitude of the coefficients. Can I add the interaction coefficient to the intercept as the total effect?

3. If the interpretation in #2 is correct, then can I graph this out somehow where I can show the interaction term's 6.04 coefficient in the graph? In other words, if the intercept of 13.749 "increases" by 6.04 (interaction term coefficient), can this increase be visualized in Stata?

Thank you so much,

Splitting An Instance into two existing attributes

I have a dataset that includes two variables called "NAME" and "TITLE".

NAME should simply be an individual's birth name (e.g. "John William Figueroa") and title should be anything appended to the end (e.g. OBE, MD, PhD, JD). Trouble is, a lot of entries instead have this information in the NAME column so that it reads "John William Figueroa, PhD".

Is there an easy way to use the comma (very frequently present) to shift the title into the next column? I'd use the "split" function but I don't want this broken into two new variables, just want to shift some of the information one line over. Thanks so much for your time!

Best,
Chuck

Specific period Buy and Hold returns for every day

Hello everyone,

I have daily panel data with prices and returns for several companies.
My question is, how do I calculate Buy and Hold returns for every company and every day in the sample.
E.g. 1-year holding period returns for ARCHER-DANIELS-MIDLAND on 15th March 2020 (until 14th March 2021), 16th March 2020 (until 15th March 2021), ...

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double PERMNO str58 CONM long caldt double prccd float RETURN
10516 "ARCHER-DANIELS-MIDLAND CO" 17136 34.83             .
10516 "ARCHER-DANIELS-MIDLAND CO" 17139 34.85   .0005741212
10516 "ARCHER-DANIELS-MIDLAND CO" 17140 34.58   -.007747393
10516 "ARCHER-DANIELS-MIDLAND CO" 17141  34.3   -.008097241
10516 "ARCHER-DANIELS-MIDLAND CO" 17142 33.76   -.015743468
10516 "ARCHER-DANIELS-MIDLAND CO" 17143 34.07    .009182505
10516 "ARCHER-DANIELS-MIDLAND CO" 17146 33.69    -.01115354
10516 "ARCHER-DANIELS-MIDLAND CO" 17147 33.47   -.006530051
10516 "ARCHER-DANIELS-MIDLAND CO" 17148 32.76    -.02121311
10516 "ARCHER-DANIELS-MIDLAND CO" 17149 32.62   -.004273486
10516 "ARCHER-DANIELS-MIDLAND CO" 17150 32.72    .003065674
10516 "ARCHER-DANIELS-MIDLAND CO" 17153 32.06   -.020171143
end
format %td caldt

Please keep in mind that in the data only trading days are included.

I have looked for a solution for several hours but haven't found anything that exactly matches my problem. I think using asrol and generating an identifying variable could work out. Indeed for the first day of the year and a yearly variable this works out, however, already for the second trading day of the year, one observation is missing. In addition, I thought about generating an identifying dummy variable for every date, however, this is probably not the best solution, since my dataset is really large (c. 15 million observations).

Code:

gen fyear1 = year(caldt)
bysort PERMNO fyear1 : asrol RETURN, stat(product) add(1)

Best regards,
Jakob Stoll

Friday, January 28, 2022

Getting regression results for my last dummy

Hello everyone,

I have 3 dummy variables where I have created 2 variables one is called low_dev and the other one is called high_dev. Now I was wondering how I could display the third dummy which is not created because if both low_dev==0 and high_dev==0 then the constant becomes the last dummy if I remember it correctly?

Code:

gen Built_L= Built_area<18.27
gen Built_H= Built_area>25.981
gen Built_M= 1 - Built_L - Built_H

gen Agri_L= Agri_area<48.90
gen Agri_H= Agri_area>54.75
gen Agri_M= 1-Agri_L-Agri_H    

gen Forest_L= NaturalForest_area<7.3
gen Forest_H= NaturalForest_area>15.82
gen Forest_M= 1-Forest_H-Forest_L

gen low_dev = Built_L & Agri_H & Forest_H
gen high_dev = Built_H & Agri_L & Forest_L

Regression command for low dev

Code:

xtreg log_realHP log_PopulationDensity log_Population Unemployment_rate log_real_consCost logReal_income real_interest i.low_dev i.Year,fe vce(robust)

regression coefficient calculation not correct

Hello everyone,

My coefficient for my Construction costs index is too high I have correct construction costs index by dividing it by the cpi index and multiplied by 100 to get the real construction cost variable afterwards I have taken the log of real construction cost to get log real construction cost index. However when I run my regression I get a very high coefficient and I was wondering how I can solve this problem?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double ConsCost_index float(real_consCost log_real_consCost)
  100       100 4.6051702
105.2 102.73438  4.632147
109.3 102.53437  4.630198
111.5 101.25672  4.617659
113.7 101.13087 4.6164155
115.9 101.76472 4.6226635
119.3 102.99907   4.63472
124.1 105.97746 4.6632266
129.8 109.09948 4.6922603
  130 106.60252  4.669107
130.8  105.9867 4.6633134
133.4 106.70628   4.67008
135.7 106.10562  4.664435
  136 103.74653 4.6419506
137.2  102.1092  4.626043
139.8 103.01408 4.6348658
142.6  104.4506  4.648714
145.9 106.54813 4.6685967
149.6 107.74178 4.6797376
153.8 108.91506 4.6905684
157.2 108.50176 4.6867666
  100       100 4.6051702
105.2 102.73438  4.632147
109.3 102.53437  4.630198
111.5 101.25672  4.617659
113.7 101.13087 4.6164155
115.9 101.76472 4.6226635
119.3 102.99907   4.63472
124.1 105.97746 4.6632266
129.8 109.09948 4.6922603
  130 106.60252  4.669107
130.8  105.9867 4.6633134
133.4 106.70628   4.67008
135.7 106.10562  4.664435
  136 103.74653 4.6419506
137.2  102.1092  4.626043
139.8 103.01408 4.6348658
142.6  104.4506  4.648714
145.9 106.54813 4.6685967
149.6 107.74178 4.6797376
153.8 108.91506 4.6905684
157.2 108.50176 4.6867666
  100       100 4.6051702
105.2 102.73438  4.632147
109.3 102.53437  4.630198
111.5 101.25672  4.617659
113.7 101.13087 4.6164155
115.9 101.76472 4.6226635
119.3 102.99907   4.63472
124.1 105.97746 4.6632266
129.8 109.09948 4.6922603
  130 106.60252  4.669107
130.8  105.9867 4.6633134
133.4 106.70628   4.67008
135.7 106.10562  4.664435
  136 103.74653 4.6419506
137.2  102.1092  4.626043
139.8 103.01408 4.6348658
142.6  104.4506  4.648714
145.9 106.54813 4.6685967
149.6 107.74178 4.6797376
153.8 108.91506 4.6905684
157.2 108.50176 4.6867666
  100       100 4.6051702
105.2 102.73438  4.632147
109.3 102.53437  4.630198
111.5 101.25672  4.617659
113.7 101.13087 4.6164155
115.9 101.76472 4.6226635
119.3 102.99907   4.63472
124.1 105.97746 4.6632266
129.8 109.09948 4.6922603
  130 106.60252  4.669107
130.8  105.9867 4.6633134
133.4 106.70628   4.67008
135.7 106.10562  4.664435
  136 103.74653 4.6419506
137.2  102.1092  4.626043
139.8 103.01408 4.6348658
142.6  104.4506  4.648714
145.9 106.54813 4.6685967
149.6 107.74178 4.6797376
153.8 108.91506 4.6905684
157.2 108.50176 4.6867666
  100       100 4.6051702
105.2 102.73438  4.632147
109.3 102.53437  4.630198
111.5 101.25672  4.617659
113.7 101.13087 4.6164155
115.9 101.76472 4.6226635
119.3 102.99907   4.63472
124.1 105.97746 4.6632266
129.8 109.09948 4.6922603
  130 106.60252  4.669107
130.8  105.9867 4.6633134
133.4 106.70628   4.67008
135.7 106.10562  4.664435
  136 103.74653 4.6419506
137.2  102.1092  4.626043
139.8 103.01408 4.6348658
end

This is my regression command and results:

Code:

. asdoc xtreg log_realHP log_PopulationDensity log_Population Unemployment_rate log_
> real_consCost logReal_income real_interest i.low_dev i.Year,fe vce(robust) replace
>  cnames(low development) save(PanelData_regression) add(Low dev Dummy,YES, Year Du
> mmy,YES) dec(3)
note: 2018.Year omitted because of collinearity
note: 2019.Year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      2,532
Group variable: GM_code                         Number of groups  =        282

R-sq:                                           Obs per group:
     within  = 0.8875                                         min =          8
     between = 0.0581                                         avg =        9.0
     overall = 0.0023                                         max =          9

                                                F(13,281)         =     946.78
corr(u_i, Xb)  = -0.5914                        Prob > F          =     0.0000

                                   (Std. Err. adjusted for 282 clusters in GM_code)
-----------------------------------------------------------------------------------
                  |               Robust
       log_realHP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
log_PopulationD~y |   .0364162   .0302895     1.20   0.230    -.0232068    .0960393
   log_Population |   .1433754    .076217     1.88   0.061    -.0066533    .2934041
Unemployment_rate |     .03056   .0082342     3.71   0.000     .0143513    .0467686
log_real_consCost |   78.75831   1.275555    61.74   0.000     76.24746    81.26917
   logReal_income |   .2192286   .1029772     2.13   0.034     .0165239    .4219334
    real_interest |   .5209448   .0086732    60.06   0.000     .5038721    .5380175
        1.low_dev |   .0109145   .0026326     4.15   0.000     .0057325    .0160966
                  |
             Year |
            2012  |   1.041003   .0177587    58.62   0.000     1.006046     1.07596
            2013  |   2.727895   .0451684    60.39   0.000     2.638983    2.816806
            2014  |   3.375704   .0558562    60.44   0.000     3.265754    3.485653
            2015  |   2.838823   .0477437    59.46   0.000     2.744843    2.932804
            2016  |    1.79892   .0302964    59.38   0.000     1.739283    1.858557
            2017  |   .7132875   .0122971    58.00   0.000     .6890815    .7374935
            2018  |          0  (omitted)
            2019  |          0  (omitted)
                  |
            _cons |  -359.9836    6.54667   -54.99   0.000    -372.8704   -347.0969
------------------+----------------------------------------------------------------
          sigma_u |  .29981225
          sigma_e |  .02925436
              rho |  .99056879   (fraction of variance due to u_i)
----------------------------------------------------------------------------------

Latitude and Longitude to perform geonear and the case of 90 to -90

To perform geo near, we require latitude and longitude. For this do we use the coordinates information from the shapefile or create the centroids separately using the centroid command.? Over here I have used the latitudes and longitudes by using the centroid command.

Second, I am trying to perform geo near. I want to look at the border districts' effect (spillover effect) i.e I want to find the non-treated neighboring districts for my treated districts. However, the treatment of the district varies over time. In other words, districts come in for treatment and go out over time. Now, I have got two datasets with the centroid values - one includes only treated districts and the other non-treated districts. After which I perform the geo near using the following codes with 2000 km?

geonear district x_stub y_stub using "Non- treated districts with centroids.dta", n(district1 x_stub1 y_stub1) ign long within(2000) near(2)

However, I am getting the following syntax.

nbor latitude var x_stub1 must be between -90 and 90
r(198);

Can anybody please help with these concerns?

(ssc) esttab and stcrreg: option equations() invalid

For some reason esttab (an ssc package) will not let me combine these two outputs:

Code:

. stcrreg contrib vx1adultHazRtHat, compete(dosedByNow==2)

[ . . . ]

Competing-risks regression                        No. of obs      =      5,482
                                                  No. of subjects =        539
Failure event:   dosedByNow == 1                  No. failed      =        194
Competing event: dosedByNow == 2                  No. competing   =         37
                                                  No. censored    =        308

                                                  Wald chi2(2)    =       5.88
Log pseudolikelihood = -1184.7533                 Prob > chi2     =     0.0528

                               (Std. err. adjusted for 539 clusters in subject_id)
----------------------------------------------------------------------------------
                 |               Robust
              _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
         contrib |   1.125542   .0549522     2.42   0.015     1.022831    1.238568
vx1adultHazRtHat |   1.024642    .043759     0.57   0.569      .942367      1.1141
----------------------------------------------------------------------------------

. eststo e1, title("cr")

. stcrreg contrib vx1adultHazRtHat, compete(dosedByNow==2) tvc(contrib vx1adultHazRtHat)

[ . . . ]

Competing-risks regression                        No. of obs      =      5,482
                                                  No. of subjects =        539
Failure event:   dosedByNow == 1                  No. failed      =        194
Competing event: dosedByNow == 2                  No. competing   =         37
                                                  No. censored    =        308

                                                  Wald chi2(4)    =       6.33
Log pseudolikelihood = -1184.4078                 Prob > chi2     =     0.1759

                               (Std. err. adjusted for 539 clusters in subject_id)
----------------------------------------------------------------------------------
                 |               Robust
              _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
main             |
         contrib |   1.167825   .0900622     2.01   0.044     1.003999    1.358382
vx1adultHazRtHat |   1.058385   .0715551     0.84   0.401     .9270347    1.208347
-----------------+----------------------------------------------------------------
tvc              |
         contrib |   .9991668   .0012241    -0.68   0.496     .9967704    1.001569
vx1adultHazRtHat |   .9991421   .0013809    -0.62   0.535     .9964393    1.001852
----------------------------------------------------------------------------------
Note: Variables in tvc equation interacted with _t.

. eststo e2, title("tvc")

. esttab e1

----------------------------
                      (1)  
                       _t  
----------------------------
eq1                        
contrib             0.118*  
                   (2.42)  

vx1adultHa~t       0.0243  
                   (0.57)  
----------------------------
N                    5482  
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

. esttab e2

----------------------------
                      (1)  
                       _t  
----------------------------
main                        
contrib             0.155*  
                   (2.01)  

vx1adultHa~t       0.0567  
                   (0.84)  
----------------------------
tvc                        
contrib         -0.000834  
                  (-0.68)  

vx1adultHa~t    -0.000858  
                  (-0.62)  
----------------------------
N                    5482  
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

. esttab e1 e2
option equations() invalid
specified equation name already occurs in model 2
r(198);

Anyone have an idea of what is going wrong? I tried specifying "equations()" to try to clear out whatever it dislikes, to no avail. Thanks!!

lazy man's research

I have some info culled from international registries of non-US rehab medicine clinical trials

one of the questions of interest is whether a trial included subjects over a certain age (e.g., 65 or 85); note that I do NOT have info on the inclusion/exclusion criteria

for most trials I have the total N, but not for all trials

for some trials I have the mean and SD (though for some I only have the mean) or I have info that can be used to estimate these values (I am using the formulae suggested in Wan, X, et al. (2014), "Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range", BMC Medical Research Methodology, 14: 135)

If I have the N, the mean and the SD AND if I can assume that the ages are approximately normally distributed, then answering the over65 or over85 question is easy. However, I am uncomfortable making this assumption and I am looking for citations that provide guidance for either (1) different distributions (esp skewed ones) and/or (2) truncated distributions but where the truncation point is unknown. If I could find such cites, I could do a sensitivity analysis re: the assumed normal distribution answer

So, does any one know of any such citations or have other suggestions?

By the way, I know that I could do a bunch of simulations to get there but I think this would be more expensive then my clients want to go.

Thursday, January 27, 2022

spmap polygon sepearate graphs

Dear all,
I'm trying to overlay EU NUTS2 and NUTS3 using spmap, but the polygon option doesn't combine two graphs and is seperating them. Could anyone suggest what to do, please? Many thanks.

use "NUTS_RG_20M_2021_3035.dta", clear /// NUTS3

merge 1:1 NUTS3 using "sector_data_by_NUTS3.dta" /// merge with sector level data by NUTS3
keep if _m == 3
drop _m

* Polygon
spmap green_share_emp using "NUTS_RG_20M_2021_3035_shp.dta", id(_ID) fcolor(Blues2) clnumber(4) polygon(data("nuts2coord_Germany.dta") ocolor(black))

Array

Obtain which bin and corresponding density each observation in plotted histogram belongs to?

Hello everyone,

Suppose I plot a histogram:

Code:

clear
set obs 10

g z = _n
replace z = 5 if _n > 5

hist z

Given the plotted histogram, I would like to generate two new variables:

bin, giving the bin which a given observation belongs to.
density, giving the density of the bin which the observation belongs to.

If I generated these two variables manually for the histogram plotted:

Code:

g correct_bin = 1 if inrange(_n, 1, 2)
replace correct_bin = 2 if _n == 3
replace correct_bin = 3 if _n >= 4

g correct_density = 0.15 if inrange(_n, 1, 2)
replace correct_density = 0.075 if _n == 3
replace correct_density = 0.525 if _n >= 4

I have tried using the command twoway__histogram_gen to create a solution. However, while my solution works in the above case it does not seem to work for example when:

Bins aren't just beside each other, that is for example bin 1 = [1,2) and bin 2 = [5,6)
Or even just when the sample size grows and the values of the z variable is continuous, then numerical issues quickly arise

I suspect a combination of twoway__histogram_gen and egen cut could be used to generate a correct solution, below follows my attempt which works for my toy example. I first outline the ideas and then provide the code:

Use twoway__histogram_gen to find the midpoint of each bin.
Adjust the midpoints to be the start points of the bins.
Create new variables x_v which are constant to the start point of bin v.
Check which interval [x_v, x_{v+1}) each observation belongs to.^a
Find the corresponding density of that bin.

Here is the code with each step labelled as in the description above:

Code:

* 1, finding midpoints
twoway__histogram_gen z, gen(y x)

* 2, adjusting midpoints to start points
local adjust = (x[2] - x[1]) / 2
replace x = x - `adjust'

* 3, generating variables constant to startpoints
count if x != .
local N = r(N)
forvalues v = 1/`=`N'+1' {
    g x_`v' = x[`v']
}

* 4, finding bin of each observation
g new_bin = .
forvalues v = 1/`N' {
    replace new_bin = `v' if x_`v' <= z & z < x_`=`v'+1'
}

* 5, finding density of the bin
g new_density = .
forvalues v = 1/`N' {
    replace new_density = y[`v'] if new_bin == `v'
}

Checking so this has given the correct solution:

Code:

assert correct_bin == new_bin
assert correct_density == new_density

Finally, note that by browsing the data we see rounding already becoming a slight problem since we have x_1 == .99999994 instead of x_1 == 1.

Thanks in advance for help!
Simon

^aI'm not sure that the histogram bins of Stata are indeed of the form [x_v, x_{v+1}) (i.e., closed-open) but my investigations indicate this is true (the final bin having endpoint infinity).

Same date or within 7 days, within two variables

Dear statalist-members,
I need to identify patients that got med1 and med2 on the same day, or within 7 days. I made a variable that identified if a patient got two or more prescriptions on the same date:
duplicates tag patID date, gen (sameday)

But that new variable also gave me some results of patients who received two types of med1, not both med1 and med2, as I am interested in.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long patID float(date med1 med2) byte sameday
76632 19493 1 . 0
76779 21626 1 . 1
76779 21626 . 1 1
76779 21781 1 . 0
76779 21896 1 . 0
76804 21509 . 1 0
76806 20612 . 1 0
76806 20685 . 1 0
76837 19620 1 . 1
76837 19620 1 . 1
76837 19709 1 . 2
76837 19709 1 . 2
76837 19709 . 1 2
76911 20368 1 . 0
76911 20403 . 1 0
76911 20458 . 1 0
76911 20511 . 1 0
76911 20581 . 1 0
77072 19852 1 . 0
77072 19950 1 . 0
77072 20485 . 1 1
77072 20485 1 . 1
77072 20544 . 1 0
77072 20850 1 . 0
77072 21565 . 1 1
77072 21565 1 . 1
77072 21595 . 1 0
77072 21862 . 1 0
77072 21875 . 1 0
77601 21867 1 . 1
77601 21867 1 . 1
77636 19463 1 . 1
77636 19463 1 . 1
77636 19600 1 . 1
77636 19600 1 . 1
77636 19884 1 . 0
77636 19977 1 . 0
77636 20123 1 . 0
77636 20301 1 . 0
77636 20593 1 . 0
77636 20678 1 . 0
77745 21265 . 1 0
77745 21803 . 1 0
77755 20982 1 . 0
end
format %tdCCYY-NN-DD date

Thank you, Cathrine

Simple percentiles using wide dataset

Hello,

I'm not sure why I am having such trouble creating a variable corresponding to percentiles. I have a dataset that looks like the dataex example below with a person ID (pid) and a number (adheresum). I am trying to assign a percentile to each ID based on the value of adheresum. I tried pctile pct=adheresum but this just ended up with all missing values for all but ID=1. What am I missing here?

Thank you very much!

Sarah

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(pid adheresum)
  1 120
  2  28
  3  44
  4  13
  5 112
  6 145
  7   4
  8  84
  9  68
 10 143
 11  31
 12   4
 13 164
 14  46
 15 136
 16  44
 17  35
 18  15
 19  87
 20 140
 21 157
 22 158
 23   5
 24 162
 25  88
 26  18
 27  93
 28  45
 29  11
 30  90
 31  12
 32 177
 33  81
 34 107
 35 105
 36 148
 37  82
 38 124
 39  49
 40  30
 41  85
 42   8
 43 122
 44  78
 45 101
 46   9
 47  69
 48 154
 49  18
 50  56
 51 128
 52 164
 53  34
 54 131
 55 102
 56 127
 57 108
 58 160
 59 133
 60 127
 61  97
 62 119
 63 112
 64  90
 65  48
 66 166
 67 156
 68 123
 69 112
 70 150
 71  34
 72 153
 73  56
 74 164
 75  81
 76 129
 77  95
 78 122
 79 107
 80 148
 81  69
 82  69
 83  48
 84  50
 85  11
 86  67
 87  70
 88 152
 89  51
 90  77
 91 161
 92 171
 93  12
 94  96
 95 111
 96 151
 97  90
 98  96
 99 160
100 135
end

Checking Duplications

I am trying to use xtset and I know how to remove any duplications. Is there a way to see if the non-id/time variables also repeat in value? For example, if I have

id	year	X
1	1	4
1	1	4

Could I see if X repeats along with those same observation id and year?

Matching several observations via new variable

Hello everyone,

Currently I'm working with a dataset that contains several waves with various ID's (mothers) and their corresponding children, who also have their own ID's .
Each ID of the mother is matched in one line with the ID of their children.
However, In the dataset also contains mothers who have more than ohne child, so one ID can be potentially matched with several CIDs (As you can see in the example).

Wave	ID	CID
1	21001	1001
1	21001	1002

To see how many mothers have more than one child (via the CID) in the Dataset, I need a new variable that contains mothers with multiple children.
This command should say (by sense)

"gen children if there is more than one CID hat belongs to the ID"

Hopefully you got my point and can help me figure this out.

Thanks in advance!

Wednesday, January 26, 2022

Getting the repeated-measures bse() error (again)

Hello,

I'm new to the list so apologies for posting a basic question. Until the latest update of STATA, I've been able to figure out how to do a repeated-measures analysis without getting this error:
could not determine between-subject error term; use bse() option

Now I'm getting it again and can't figure out how to change the syntax. I'm pretty sure I've read every post in this forum on the bseunit/bse error, as well the user manual, the help menus, and whatever I can find online, and am still stuck.

It's an experiment where each person saw 4 types of stimuli (the category variable), and each type had a number of items. The dataset is too large to paste here, so below are the first two participants. There are different numbers of items in each category, but I didn't think that would matter.

Our research question is whether there is an effect of category. We don't need effects of items, but I put them in the model.

I initially thought it would be:

anova score category item, repeated(category)

but that generates the error. Ditto with adding a bse or bseunit.

Thanks in advance for any advice!

Participant	Category	Item	Difference
1	1	1	0
1	1	2	1
1	1	3	2
1	1	4	2
1	1	5	1
1	1	6	2
1	1	7	1
1	2	1	0
1	2	2	3
1	2	3	2
1	2	4	2
1	2	5	1
1	2	6	0
1	2	7	0
1	2	8	2
1	2	9	2
1	3	1	0
1	3	2	2
1	3	3	2
1	3	4	2
1	3	5	1
1	3	6	1
1	3	7	2
1	3	8	2
1	3	9	1
1	3	10	2
1	4	1	2
1	4	2	2
1	4	3	2
1	4	4	1
1	4	5	1
1	4	6	1
1	4	7	1
1	4	8	2
1	4	9	2
1	4	10	2
1	5	1	2
1	5	2	1
1	5	3	1
1	5	4	1
1	5	5	2
1	5	6	2
1	5	7	1
1	5	8	1
1	5	9	2
1	5	10	2
2	1	1	1
2	1	2	3
2	1	3	0
2	1	4	2
2	1	5	1
2	1	6	1
2	1	7	3
2	2	1	3
2	2	2	1
2	2	3	0
2	2	4	1
2	2	5	0
2	2	6	2
2	2	7	1
2	2	8	1
2	2	9	1
2	3	1	2
2	3	2	2
2	3	3	0
2	3	4	2
2	3	5	0
2	3	6	2
2	3	7	2
2	3	8	3
2	3	9	1
2	3	10	1
2	4	1	1
2	4	2	1
2	4	3	1
2	4	4	2
2	4	5	1
2	4	6	1
2	4	7	0
2	4	8	1
2	4	9	2
2	4	10	1
2	5	1	2
2	5	2	2
2	5	3	1
2	5	4	1
2	5	5	2
2	5	6	2
2	5	7	2
2	5	8	1
2	5	9	2
2	5	10	1