Using Stata, is there any convenient way to bootstrap in parallel? Although Stata/MP does some things in parallel, it seems it still bootstraps in parallel. In 2018, jerome falken reported some trouble using the -parallel bs- command by George Vega and Brian Quistorff, which I am also having trouble with. Maarten Buis offered a solution, but the solution pertained to simulation rather than bootstrapping.
Has there been any progress since 2018? Bootstrapping is an embarrassingly parallel task, and it is becoming a little embarrassing if it can't be conveniently parallelized in Stata.
(Here's the 2018 discussion of this issue: https://www.statalist.org/forums/for...l-bootstraping
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Monday, January 31, 2022
Syntax Question
Dear all,
I have the following data merged (industry level data and trade at the industry level at the same level of disaggregation). I want to build an index (which has in the literature for a while). And I want to make sure I am using the proper syntax.I posted the index from latex document file.
The data is the following
------------------ copy up to and including the previous line ------------------
In order to calculate the first term of the index (Skit) = VAkit/GDPit, I am using the following syntax (k industry, i country, t year). VAkit would be Value added per industry k in country i in year t
sort country isic year
by country isic year: egen TotalOutput_sector= sum(OutputINDSTAT4)
sort isic year country
by isic year country: egen TotalValueAdded=sum(ValueAdded)
gen tradability_one= TotalValueAdded/OutputINDSTAT4
In order to calculate the second term of the index (Dkt)= Xkt/WGDPkt, I am using the following syntax (k is industry, t year). Xkt would be total exports by industry (isic) on a particular year. For WGDPkt (World Total Output), I am using total output of industry k in year t
sort isic year
by isic year: egen TotalExports= sum(NewExportsWorld) if NewExportsWorld!=. //(To avoid having shares greater than 1)
sort isic year
by isic year: egen TotalOutput= sum(NewOutput) if NewOutput!=. //(To avoid having shares greater than 1)
gen tradability_output= Totalexports/TotalOutput_sector
Finally, to calculate the final index, I am doing
gen share= tradability_output*tradability_one
The reason of my question is the fact that I am getting extremely strange values in my sample
Thank you so much,
I have the following data merged (industry level data and trade at the industry level at the same level of disaggregation). I want to build an index (which has in the literature for a while). And I want to make sure I am using the proper syntax.I posted the index from latex document file.
The data is the following
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(country year isic isiccomb) byte sourcecode long(Establishments Employment) double(Wages OutputINDSTAT4 ValueAdded GrossFixed) float(NewImportsWorld NewExportsWorld NewOutput) 752 1990 1511 1511 1 211 18970 410760814 4812639070 750829139 . 205787008 152038000 4812639232 231 1990 1511 1511 0 . . . . . . . . . 554 1990 1511 1511 1 250 27670 . . . . 39597000 1904916992 . 40 1990 1511 1511 1 211 14341 281952246 2648385387 483309907 82419767 173028000 267799008 2648385024 352 1990 1511 1511 0 . . . . . . 643000 13263000 . 428 1990 1511 1511 1 273 7176 . 4237801 . . . . 4238000 158 1990 1511 1511 0 . . . . . . 454000992 706115008 . 250 1990 1511 1511 0 . . . . . . 4090940928 3523758080 . 44 1990 1511 1511 0 . . . . . . . . . 442 1990 1511 1511 0 . . . . . . . . . 124 1990 1511 1511 1 633 46651 1127459988 10664004340 2239077148 . 818744000 1248240000 10664003584 332 1990 1511 1511 0 . . . . . . . 40000 . 410 1990 1511 1511 1 242 14578 113690412 1294216411 411153903 80461830 1689671936 70361000 1294215936 776 1991 1511 1511 0 . . . . . . . . . 44 1991 1511 1511 0 . . . . . . . . . 352 1991 1511 1511 0 . . . . . . 840000 6164000 . 158 1991 1511 1511 0 . . . . . . 525063008 1046316032 . 702 1991 1511 1511 1 17 584 5668143 29826054 15814303 5500854 237994000 . 29826000 703 1991 1511 1511 1 44 12891 20219157 497778004 . . . . 497777984 246 1991 1511 1511 0 . . . . . . 30361000 70059000 . 554 1991 1511 1511 1 250 27670 . . . . 30491000 1920531968 . 372 1991 1511 1511 1 144 12028 199582956 3536149122 581041597 77257141 176370000 1552030976 3536148992 410 1991 1511 1511 1 268 13471 118579948 1592683837 522258484 122523477 1717752960 56953000 1592684032 250 1991 1511 1511 0 . . . . . . 3985878016 3657634048 . 332 1991 1511 1511 0 . . . . . . . 0 . 578 1991 1511 1511 1 218 10305 294456237 3430694979 385781581 55221834 28105000 32967000 3430694912 752 1991 1511 1511 1 203 18243 401262876 4286257370 733530253 . 223212000 94554000 4286256896 40 1991 1511 1511 1 197 14498 303395178 2877601494 518726611 90747131 148488992 214368992 2877601024 428 1991 1511 1511 1 308 6998 . 13002620 . . . . 13003000 231 1991 1511 . 1 7 3663 3963285 41919807 18216908 576812 . . 41920000 442 1991 1511 1511 0 . . . . . . . . . 276 1991 1511 1511 0 . . . . . . 5809546752 3660185088 . 124 1991 1511 1511 1 608 44994 1139846700 10260858181 2315482061 . 883571968 1078050048 10260857856 710 1991 1511 1511 1 282 25743 151775151 973095427 326538624 . . . 973094976 703 1992 1511 1511 1 44 11804 20555460 442950701 . . . . 442951008 380 1992 1511 1511 1 487 40073 1598169003 14880936757 2621584081 429854678 5494322176 1119657984 14880936960 776 1992 1511 1511 0 . . . . . . . . . 578 1992 1511 1511 1 156 9938 310083625 3518866599 429479374 44734082 33248000 44994000 3518866944 410 1992 1511 1511 1 301 16275 159169753 1807466207 669953810 140485343 1658217984 85798000 1807465984 590 1992 1511 1511 1 29 1844 7130000 94010000 7002000 5813000 . . 94010000 792 1992 1511 1511 1 102 11338 89348079 1091967404 256402794 27211874 235286000 57520000 1091966976 352 1992 1511 1511 0 . . . . . . 1083000 6734000 . 246 1992 1511 1511 0 . . . . . . 39316000 76060000 . 702 1992 1511 1511 1 18 689 7281302 43691493 19426422 10394317 241943008 . 43691000 250 1992 1511 1511 0 . . . . . . 4353823232 4323734016 . 616 1992 1511 1511 0 . . . . . . 173932992 251923008 . 372 1992 1511 1511 1 152 12912 236507239 3893002632 598923688 64656443 195820000 1894761984 3893003008 417 1992 1511 1511 1 126 4690 2184146 53954268 . . . . 53954000 40 1992 1511 1511 1 197 14296 342939911 3218480128 588420953 101987715 163900992 234464000 3218480128 762 1992 1511 1511 1 38 1911 . 4038072 . 222762 . . 4038000 554 1992 1511 1511 1 253 27510 . . . . 36465000 1972987008 . 428 1992 1511 1511 1 164 9916 . 72852289 . . . . 72852000 442 1992 1511 1511 0 . . . . . . . . . 124 1992 1511 1511 1 588 46848 1150764898 9753177527 2360765039 . 865473984 1210505984 9753178112 332 1992 1511 1511 0 . . . . . . . . . 44 1992 1511 1511 0 . . . . . . . . . 208 1992 1511 1511 1 . . . 6926130024 1673933460 191667248 363102016 4078067968 6926130176 232 1992 1511 1511 1 2 665 407491 2568297 1205004 0 . . 2568000 158 1992 1511 1511 0 . . . . . . 507337984 1074740992 . 752 1992 1511 1511 1 206 16860 437292072 4456000026 841885404 . 294567008 96473000 4.456e+09 276 1992 1511 1511 0 . . . . . . 7119265792 3176626944 . 440 1992 1511 1511 1 . 13840 6932730 162210090 . . 1064000 59887000 162210000 231 1992 1511 . 1 7 3221 2732917 9003390 3226405 101338 . . 9003000 496 1992 1511 1511 1 6 2402 2416744 55967442 17935814 16579535 . . 55967000 710 1992 1511 1511 1 . . . 1155665786 . . 110301000 166346000 1155666048 36 1993 1511 1511 1 645 46391 . 6255440104 1832635187 . 46923000 3161800960 6255439872 762 1993 1511 1511 1 99 1660 . 10427730 . 2533552 . . 10428000 300 1993 1511 1511 0 . . . . . . 797523008 52119000 . 702 1993 1511 1511 1 19 764 8769080 57363855 22139002 2611724 235638000 47758000 57364000 554 1993 1511 1511 1 268 28710 . . . . 48869000 2014814976 . 578 1993 1511 1511 1 155 10057 283206578 3052464792 327553531 55020567 46533000 36922000 3052464896 332 1993 1511 1511 0 . . . . . . . . . 590 1993 1511 1511 1 29 1853 10127000 90596000 18053000 7148000 . . 90596000 703 1993 1511 1511 1 49 11578 22164747 356065938 54859374 21066227 . . 356065984 40 1993 1511 1511 1 192 14408 332735622 2959263196 560602833 106450231 145410000 212484992 2959262976 442 1993 1511 1511 0 . . . . . . . . . 208 1993 1511 1511 1 342 . . 5733148322 1647420341 161574160 349519008 3530907904 5733148160 158 1993 1511 1511 0 . . . . . . 495086016 1113870976 . 428 1993 1511 1511 1 224 8107 7083167 113632597 28508877 2393336 . . 113633000 380 1993 1511 1511 1 519 40676 1320163744 12029462504 2126914231 361138745 4497477120 912947968 12029462528 792 1993 1511 1511 1 102 11624 92944925 1231588530 303413746 15475649 289539008 50670000 1231588992 232 1993 1511 1511 1 2 379 337303 918390 444808 0 . . 918000 231 1993 1511 . 1 7 3152 2121600 7378200 3182200 78400 . . 7378000 440 1993 1511 1511 1 19 12286 7873732 177247834 . . . . 177248000 276 1993 1511 1511 0 . . . . . . 5450384896 2676812032 . 352 1993 1511 1511 0 . . . . . . 590000 5244000 . 398 1993 1511 1511 1 1304 . . 4.096e+08 . . . . 4.096e+08 32 1993 1511 1511 1 1079 45728 554246175 4612161218 914363739 153194467 150990000 791281024 4612161024 616 1993 1511 1511 0 . . . . . . 225496000 189412000 . 724 1993 1511 1511 1 3079 58797 933848554 10418448822 1860357115 313765688 958025024 447728992 10418449408 496 1993 1511 1511 1 5 2242 717940 17164811 5290320 2496860 . . 17165000 116 1993 1511 1511 0 . . . . . . . . . 826 1993 1511 1511 1 1579 79875 1900242517 14566026303 3259058398 451438830 3064681984 1592034048 14566026240 776 1993 1511 1511 0 . . . . . . . . . 246 1993 1511 1511 0 . . . . . . 32308000 77148000 . 124 1993 1511 1511 1 574 47010 1062780989 9715377556 2127536261 . 942958976 1343789056 9715378176 752 1993 1511 1511 1 202 16453 333976666 3409937551 600378214 . 201392000 111792000 3409937920 417 1993 1511 1511 1 158 4169 709576 22061175 . . . . 22061000 372 1993 1511 1511 1 153 13041 212624531 3533411479 561092834 75305054 124006000 1791170048 3533411072 512 1993 1511 1511 1 1 1 1560 14044 8270 . 70760000 . 14000 end
In order to calculate the first term of the index (Skit) = VAkit/GDPit, I am using the following syntax (k industry, i country, t year). VAkit would be Value added per industry k in country i in year t
sort country isic year
by country isic year: egen TotalOutput_sector= sum(OutputINDSTAT4)
sort isic year country
by isic year country: egen TotalValueAdded=sum(ValueAdded)
gen tradability_one= TotalValueAdded/OutputINDSTAT4
In order to calculate the second term of the index (Dkt)= Xkt/WGDPkt, I am using the following syntax (k is industry, t year). Xkt would be total exports by industry (isic) on a particular year. For WGDPkt (World Total Output), I am using total output of industry k in year t
sort isic year
by isic year: egen TotalExports= sum(NewExportsWorld) if NewExportsWorld!=. //(To avoid having shares greater than 1)
sort isic year
by isic year: egen TotalOutput= sum(NewOutput) if NewOutput!=. //(To avoid having shares greater than 1)
gen tradability_output= Totalexports/TotalOutput_sector
Finally, to calculate the final index, I am doing
gen share= tradability_output*tradability_one
The reason of my question is the fact that I am getting extremely strange values in my sample
Thank you so much,
Repeated time values within panel when trying to create new variable
Hi, I am trying to create a variable called "fempstat" which measures an individual's employment status in the next month. I have the following lines of code:
xtset cpsidp date
gen fempstat=f1.empstat
label var fempstat "Next month employment status"
However, I am getting the error "repeated time values within panel". I have tried to switch the variable "date" out with "month" but I am still getting the same error.
xtset cpsidp date
gen fempstat=f1.empstat
label var fempstat "Next month employment status"
However, I am getting the error "repeated time values within panel". I have tried to switch the variable "date" out with "month" but I am still getting the same error.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int year byte month double cpsid byte(statefip empstat) float date 2021 1 20191000055400 1 10 732 2021 1 20201100108800 1 10 732 2021 1 20191000000800 1 36 732 2021 1 20201100070800 1 36 732 2021 1 20201000048900 1 21 732 2021 1 20201100075800 1 34 732 2021 1 20191000006900 1 0 732 2021 1 20201200015600 1 36 732 2021 1 20201200115200 1 10 732 2021 1 20210100040500 1 10 732 2021 1 20191000107600 1 32 732 2021 1 20201100138400 1 10 732 2021 1 20191000064600 1 10 732 2021 1 20201100070800 1 36 732 2021 1 20191200098300 1 32 732 2021 1 20201200057900 1 10 732 2021 1 20191200132900 1 34 732 2021 1 20201000063600 1 10 732 2021 1 20201200039300 1 10 732 2021 1 20201100033300 1 34 732 2021 1 20191200076200 1 36 732 2021 1 20191000062900 1 10 732 2021 1 20201100060300 1 10 732 2021 1 20201000122200 1 36 732 2021 1 20201000023600 1 10 732 2021 1 20210100072600 1 36 732 2021 1 20191100037400 1 0 732 2021 1 20191200085700 1 36 732 2021 1 20200100122500 1 36 732 2021 1 20201200122800 1 10 732 2021 1 20191100108900 1 0 732 2021 1 20201100005300 1 10 732 2021 1 20201200068000 1 36 732 2021 1 20191200030500 1 10 732 2021 1 20191100144700 1 21 732 2021 1 20191000127000 1 10 732 2021 1 20201000057500 1 10 732 2021 1 20200100102700 1 0 732 2021 1 20201100025400 1 10 732 2021 1 20201100056500 1 10 732 2021 1 20200100070100 1 10 732 2021 1 20191200117500 1 12 732 2021 1 20191100126000 1 34 732 2021 1 20201000010700 1 10 732 2021 1 20191200094600 1 36 732 2021 1 20201000000200 1 10 732 2021 1 20201100000100 1 36 732 2021 1 20201100064000 1 10 732 2021 1 20191000126700 1 36 732 2021 1 20201200008400 1 10 732 2021 1 20210100014400 1 10 732 2021 1 20201100069100 1 10 732 2021 1 20201200123000 1 10 732 2021 1 20191000133700 1 10 732 2021 1 20201100108500 1 36 732 2021 1 20201200135300 1 10 732 2021 1 20191200075100 1 10 732 2021 1 20210100009800 1 34 732 2021 1 20210100115200 1 12 732 2021 1 20191100082900 1 10 732 2021 1 20201000137500 1 10 732 2021 1 20191000083500 1 10 732 2021 1 20191100028100 1 10 732 2021 1 20210100044200 1 10 732 2021 1 20201200124900 1 36 732 2021 1 20201100033000 1 10 732 2021 1 20191100004600 1 10 732 2021 1 20201100079500 1 0 732 2021 1 20201000133500 1 10 732 2021 1 20201200039400 1 10 732 2021 1 20210100023300 1 36 732 2021 1 20210100011700 1 36 732 2021 1 20201200057300 1 36 732 2021 1 20201100109500 1 0 732 2021 1 20200300000800 1 0 732 2021 1 20201100139800 1 34 732 2021 1 20191100060300 1 0 732 2021 1 20200100147800 1 32 732 2021 1 20191100123800 1 0 732 2021 1 20201100082100 1 10 732 2021 1 20201000033800 1 0 732 2021 1 20191200075100 1 34 732 2021 1 20201200006800 1 36 732 2021 1 20201200016600 1 32 732 2021 1 20201100112900 1 10 732 2021 1 20210100119400 1 10 732 2021 1 20201000085300 1 34 732 2021 1 20210100111900 1 0 732 2021 1 20201000077700 1 0 732 2021 1 20200100145800 1 36 732 2021 1 20200100057300 1 36 732 2021 1 20201200087500 1 10 732 2021 1 20201000099500 1 32 732 2021 1 20200100108700 1 10 732 2021 1 20201200140700 1 36 732 2021 1 20191000064600 1 21 732 2021 1 20191200044200 1 10 732 2021 1 20201200057300 1 36 732 2021 1 20210100042900 1 12 732 2021 1 20191200106300 1 36 732 end format %tm date label values month month_lbl label def month_lbl 1 "January", modify label values statefip statefip_lbl label def statefip_lbl 1 "Alabama", modify label values empstat empstat_lbl label def empstat_lbl 0 "NIU", modify label def empstat_lbl 10 "At work", modify label def empstat_lbl 12 "Has job, not at work last week", modify label def empstat_lbl 21 "Unemployed, experienced worker", modify label def empstat_lbl 32 "NILF, unable to work", modify label def empstat_lbl 34 "NILF, other", modify label def empstat_lbl 36 "NILF, retired", modify
Date format using month and year
Hi, I am trying to make a new variable called "date" using the month and year in my dataset. The variable "month" is a byte and the variable "year" is an int. I would like to make it into the format <year>m<month> and call this variable date. For example, month 01 (january) and year 2022 would become 2022m01. This variable should be a float. I'm unsure of how to convert these two variables into a single float variable with that format. Thanks.
Generalized Leontief variable cost function
Dear All,
I am having difficulty to estimate in Stata a generalized Leontief variable cost function with 3 variable inputs and 4 quasi-fixed inputs. I want to use the command 'sureg' function to estimate the cost function in Morrison Catherine J. (1997) Structural change, caiptal investment and productivity in the Food processing inudstry. American Journal of Agricultural Economics 79, 110-125.
I am writing to kindly seek your help and ask whether anyone of you have estimate this system of 7 equations in Stata. I would appreciate your help and advice.
Best regards,
Alphonse
I am having difficulty to estimate in Stata a generalized Leontief variable cost function with 3 variable inputs and 4 quasi-fixed inputs. I want to use the command 'sureg' function to estimate the cost function in Morrison Catherine J. (1997) Structural change, caiptal investment and productivity in the Food processing inudstry. American Journal of Agricultural Economics 79, 110-125.
I am writing to kindly seek your help and ask whether anyone of you have estimate this system of 7 equations in Stata. I would appreciate your help and advice.
Best regards,
Alphonse
Plotting heterogenous (interaction) effect in event study plot
I have the following dataset,
I have a staggered diff in diff setting. In the dataset, lead values correspond to the indicator variables for pre-treatment values, whereas lag values correspond to the indicator variables for post-treatment values. The dependent variable is sales. This is my code for the baseline plot:
This is the output:
Array
Now I would like to see the effect of treatment on independent titles, for this I have a variable called independent. I have two questions regarding this. In a usual staggered diff in diff regression, without time-varying estimates, my code would be the following:
To show time varying effect, can I use:
If yes how can I make a similar plot to previous one but shows how the effect differs for the titles of independent artists (independent == 1) vs label artist (independent == 0). If no, what would be correct code to run the analysis and plot them? Better to run split sample analysis, if yes how can I combine plots from two different regressions?
Apologies for many questions...
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(id monthly independent sales TreatZero lead2 lead3 lead4 lead5 lead6 lead7_backwards lag1 lag2 lag3 lag4 lag5 lag6 lead1) 1 672 0 249512 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 673 0 177712 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 674 0 109524 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 675 0 20776 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 676 0 846471 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 677 0 328806 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 678 0 46470 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 679 0 394758 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 680 0 301179 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 681 0 756129 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 682 0 116117 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 683 0 374293 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 684 0 432423 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 685 0 364780 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 686 0 797174 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 687 0 400569 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 688 0 126897 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2 672 1 65104 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 673 1 77133 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 674 1 76200 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 675 1 218342 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 676 1 39265 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 677 1 6649 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 678 1 41677 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 679 1 156277 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 680 1 98535 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 681 1 3920 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 682 1 165573 0 0 0 1 0 0 0 0 0 0 0 0 0 0 2 683 1 73413 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 684 1 97216 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 685 1 106015 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 686 1 33066 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 687 1 54207 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 688 1 118173 0 0 0 0 0 0 0 0 1 0 0 0 0 0 3 672 0 737203 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 673 0 306725 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 674 0 198990 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 675 0 1054751 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 676 0 1886147 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 677 0 1142545 0 0 0 0 1 0 0 0 0 0 0 0 0 0 3 678 0 1277825 0 0 0 1 0 0 0 0 0 0 0 0 0 0 3 679 0 397706 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 680 0 1354199 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 681 0 1348788 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 682 0 914274 1 0 0 0 0 0 0 0 0 0 0 0 0 0 3 683 0 805134 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3 684 0 769588 0 0 0 0 0 0 0 0 1 0 0 0 0 0 3 685 0 292174 0 0 0 0 0 0 0 0 0 1 0 0 0 0 3 686 0 1236297 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 687 0 58338 0 0 0 0 0 0 0 0 0 0 0 1 0 0 3 688 0 1681455 0 0 0 0 0 0 0 0 0 0 0 0 1 0 4 672 1 82611 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 673 1 190401 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 674 1 122867 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 675 1 111444 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 676 1 44781 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 677 1 158895 0 0 0 0 1 0 0 0 0 0 0 0 0 0 4 678 1 71693 0 0 0 1 0 0 0 0 0 0 0 0 0 0 4 679 1 62140 0 0 1 0 0 0 0 0 0 0 0 0 0 0 4 680 1 321720 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 681 1 188944 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 682 1 179921 1 0 0 0 0 0 0 0 0 0 0 0 0 0 4 683 1 159214 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4 684 1 118173 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 685 1 246030 0 0 0 0 0 0 0 0 0 1 0 0 0 0 4 686 1 83191 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4 687 1 100867 0 0 0 0 0 0 0 0 0 0 0 1 0 0 4 688 1 42409 0 0 0 0 0 0 0 0 0 0 0 0 1 0 5 672 0 32247 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 673 0 9993 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 674 0 44384 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 675 0 28284 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 676 0 6873 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 677 0 35780 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 678 0 226 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 679 0 41062 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 680 0 34161 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 681 0 5773 0 0 0 0 0 1 0 0 0 0 0 0 0 0 5 682 0 12586 0 0 0 0 1 0 0 0 0 0 0 0 0 0 5 683 0 22660 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 684 0 40637 0 0 1 0 0 0 0 0 0 0 0 0 0 0 5 685 0 40881 0 1 0 0 0 0 0 0 0 0 0 0 0 0 5 686 0 3560 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 687 0 9365 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 688 0 852 0 0 0 0 0 0 0 1 0 0 0 0 0 0 6 672 0 94715 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 673 0 2692 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 674 0 123457 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 675 0 724462 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 676 0 871857 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 677 0 16821 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 678 0 499244 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 679 0 441009 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 680 0 429921 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 681 0 156341 0 0 0 0 0 1 0 0 0 0 0 0 0 0 6 682 0 461273 0 0 0 0 1 0 0 0 0 0 0 0 0 0 6 683 0 325237 0 0 0 1 0 0 0 0 0 0 0 0 0 0 6 684 0 302210 0 0 1 0 0 0 0 0 0 0 0 0 0 0 6 685 0 332281 0 1 0 0 0 0 0 0 0 0 0 0 0 0 6 686 0 298871 0 0 0 0 0 0 0 0 0 0 0 0 0 0 end format %tm monthly
Code:
xtset id monthly xtreg sales lead7_backwards lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3 lag4 lag5 lag6 i.monthly, fe vce(cluster id) coefplot, vertical omitted keep(lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3 lag4 lag5 lag6) ciopts(recast(rcap)) yline(0) msymbol(d)
Array
Now I would like to see the effect of treatment on independent titles, for this I have a variable called independent. I have two questions regarding this. In a usual staggered diff in diff regression, without time-varying estimates, my code would be the following:
Code:
xtreg sales i.treatment##i.independent, fe vce(cluster id)
Code:
xtreg sales i.(lead7_backwards lead6 lead5 lead4 lead3 lead2 lead1 TreatZero lag1 lag2 lag3 lag4 lag5 lag6)##i.independent i.monthly, fe vce(cluster id)
Apologies for many questions...
nonest outside xtreg
The -xtreg- command has an undocumented -nonest- option which permits the use of cluster() when the cluster variable is not nested within the panel variable.
Do other xt commands offer a similar option? It appears that xtpoisson, for one, does not.
Do other xt commands offer a similar option? It appears that xtpoisson, for one, does not.
FRAPPLY: module to nondestructively apply command(s) to a frame
Dear Statalisters,
Thanks to Kit Baum, a new command frapply is now available in SSC.
Beginning with version 16, Stata can hold multiple data frames in memory. This changed how I work in Stata as I fully integrated frames into my workflow. However, using the stock frames commands felt somewhat tedious as I had to write multiple lines of command to do simple things like putting a summary of a subset into a new frame.
frapply simplifies this process. It arose from my need to write one-liners to iteratively extract what I want from the current dataset I am working with and look at the result without changing it. It applies a command or a series of commands to the dataset in the specified (or current) frame and optionally puts the result into another frame. Otherwise destructive commands (such as drop, keep, collapse, contract, etc.) can be daisy-chained somewhat similar to the pipe operator in R (and in Tidyverse), all the while preserving the dataset. This can be useful in interactive and experimental settings where we want to quickly and iteratively summarize and/or transform the dataset without changing it. It can also be a convenient drop-in replacement for the frames prefix and a substitute for frames commands such as frame copy and frame put. It can do what those commands can do--but is more flexible.
As an elementary example, let's say we want to load up the auto data, subset expensive cars, and put averages by trunk space into a different frame. And we want to try different thresholds for what "expensive" would entail, so we will repeatedly run this chunk of code.
Using frapply, this can be written more concisely as follows.
frapply takes the input frame, subsets it, applies daisy-chained commands, and puts the result in either a new or an existing frame (or even a temporary frame if into() option is omitted). We could rerun this line and get the same result regardless of the current frame.
I hope this command improves your workflow. Comments and suggestions are always welcome. Also, feel free to let me know if you find any errors.
Thanks to Kit Baum, a new command frapply is now available in SSC.
Beginning with version 16, Stata can hold multiple data frames in memory. This changed how I work in Stata as I fully integrated frames into my workflow. However, using the stock frames commands felt somewhat tedious as I had to write multiple lines of command to do simple things like putting a summary of a subset into a new frame.
frapply simplifies this process. It arose from my need to write one-liners to iteratively extract what I want from the current dataset I am working with and look at the result without changing it. It applies a command or a series of commands to the dataset in the specified (or current) frame and optionally puts the result into another frame. Otherwise destructive commands (such as drop, keep, collapse, contract, etc.) can be daisy-chained somewhat similar to the pipe operator in R (and in Tidyverse), all the while preserving the dataset. This can be useful in interactive and experimental settings where we want to quickly and iteratively summarize and/or transform the dataset without changing it. It can also be a convenient drop-in replacement for the frames prefix and a substitute for frames commands such as frame copy and frame put. It can do what those commands can do--but is more flexible.
As an elementary example, let's say we want to load up the auto data, subset expensive cars, and put averages by trunk space into a different frame. And we want to try different thresholds for what "expensive" would entail, so we will repeatedly run this chunk of code.
Code:
frame change default capture frame drop temp frame put if price > 10000, into(temp) frame change temp collapse price, by(trunk) list
Using frapply, this can be written more concisely as follows.
Code:
frapply default if price > 10000, into(temp, replace change): collapse price, by(trunk) || list
I hope this command improves your workflow. Comments and suggestions are always welcome. Also, feel free to let me know if you find any errors.
Stata not directing to correct PDF section after -help topic-
Hello everyone,
Anyone having trouble in accessing the *.pdf manuals from Stata after the -help topic- command? For example. After my command -help putexcel-, when I click "(View complete PDF manual entry)" Stata takes me to the list of all pdf manuals Stata has in its repository instead opening the particular pdf I am after. I restarted both Stata and Adobe several times without any success. Is this happening to others or just in my system? I know I can access the PDF I am after from Google search but that is not the solution I am after.
Anyone having trouble in accessing the *.pdf manuals from Stata after the -help topic- command? For example. After my command -help putexcel-, when I click "(View complete PDF manual entry)" Stata takes me to the list of all pdf manuals Stata has in its repository instead opening the particular pdf I am after. I restarted both Stata and Adobe several times without any success. Is this happening to others or just in my system? I know I can access the PDF I am after from Google search but that is not the solution I am after.
Code:
Environment: macOS Monterey. Version: 12.1 Stata version: 17 Update status Last check for updates: 31 Jan 2022 New update available: none (as of 31 Jan 2022) Adobe version: Adobe Acrobat Reader DC. Continuous Release | version: 2021.011.20039
First Stage Regression - Huge F-statistic
Hello all,
I use a stacked first differences model to estimate the impact of globalization on unemployment in Western Europe.
• Panel data: 16 countries, yearly observations for the years 1995-2007
• 2SLS regression
• The following code shows the first-stage regression, where...
The regression yields...
To get the F-statistic, I use the following code:
The F-statistic is 112125. This seems way too high? Or is it plausible to have such a high F-statistic?
If not, what could have gone wrong?
I appreciate your help!
I use a stacked first differences model to estimate the impact of globalization on unemployment in Western Europe.
• Panel data: 16 countries, yearly observations for the years 1995-2007
• 2SLS regression
• The following code shows the first-stage regression, where...
- I control for the same variables (c1, c2, c3, c4) and
- use time and country fixed effects
...just like in the reduced form / second-stage regression
Code:
xi: xtreg x z c1 c2 c3 c4 i.year, fe vce(cluster country)
- estimate for the instrument z: 1.6788Hence, the coefficient is highly significant.
- standard error of the instrument z: 0.005
To get the F-statistic, I use the following code:
Code:
test z
If not, what could have gone wrong?
I appreciate your help!
Including breakpoint dates in SVAR command
Dear Stata users,
I would like to know how to specify the code for a SVAR model in a such way that includes breakpoint equations or time dummies that correspond to structural breaks.
It should ideally be the same way Eviews gives users the option to set breakpoint equations through "@during("date1 date2")" while implementing a SVAR model.
The code I had in mind would be something like that below. It generally follows the same formatting as the example in the help window, but I would like to include breakpoints in the model:
Thank you.
I would like to know how to specify the code for a SVAR model in a such way that includes breakpoint equations or time dummies that correspond to structural breaks.
It should ideally be the same way Eviews gives users the option to set breakpoint equations through "@during("date1 date2")" while implementing a SVAR model.
The code I had in mind would be something like that below. It generally follows the same formatting as the example in the help window, but I would like to include breakpoints in the model:
Code:
svar dln_inv dln_inc dln_consump, aeq(A) beq(B)
Combining categories
I would like to combine two categories for one of my variables but I am not sure whether there is a test I could run to justify combining the categories. Here is an example of what I am trying to do:
As categories 2/3 are not statistically significant, I would like to combine them. The results remain the same. As in is there a difference between categories 2 and 3? I am not sure if I am phrasing my question well..
Code:
. ta worry How worried are | you about being | infected with | COVID-19? | Freq. Percent Cum. -----------------+----------------------------------- Not at all | 641 37.31 37.31 A little | 387 22.53 59.84 Rather | 203 11.82 71.65 Very | 487 28.35 100.00 -----------------+----------------------------------- Total | 1,718 100.00 . . xtreg WB i.worry [pw= panel_ind_wt_1_2], fe Fixed-effects (within) regression Number of obs = 1,718 Group variable: Findid Number of groups = 859 R-sq: Obs per group: within = 0.0191 min = 2 between = 0.0388 avg = 2.0 overall = 0.0214 max = 2 F(3,858) = 2.25 corr(u_i, Xb) = 0.0312 Prob > F = 0.0811 (Std. Err. adjusted for 859 clusters in Findid) ------------------------------------------------------------------------------ | Robust WB | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- worry | A little | .5341085 .3449488 1.55 0.122 -.1429337 1.211151 Rather | .0941577 .3733732 0.25 0.801 -.6386741 .8269896 Very | .688718 .2973494 2.32 0.021 .1051006 1.272335 | _cons | -.4072548 .1753646 -2.32 0.020 -.7514485 -.063061 -------------+---------------------------------------------------------------- sigma_u | 1.5148842 sigma_e | 1.8692139 rho | .39643111 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . . recode worry 3=2 4=3 (worry: 690 changes made) . . ta worry How worried are | you about being | infected with | COVID-19? | Freq. Percent Cum. -----------------+----------------------------------- Not at all | 641 37.31 37.31 A little | 590 34.34 71.65 Rather | 487 28.35 100.00 -----------------+----------------------------------- Total | 1,718 100.00 . . xtreg WB i.worry [pw= panel_ind_wt_1_2], fe Fixed-effects (within) regression Number of obs = 1,718 Group variable: Findid Number of groups = 859 R-sq: Obs per group: within = 0.0146 min = 2 between = 0.0451 avg = 2.0 overall = 0.0238 max = 2 F(2,858) = 2.69 corr(u_i, Xb) = 0.0556 Prob > F = 0.0684 (Std. Err. adjusted for 859 clusters in Findid) ------------------------------------------------------------------------------ | Robust WB | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- worry | A little | .3706983 .310278 1.19 0.233 -.2382945 .9796911 Rather | .6793746 .2977687 2.28 0.023 .0949343 1.263815 | _cons | -.4024158 .1756929 -2.29 0.022 -.7472539 -.0575777 -------------+---------------------------------------------------------------- sigma_u | 1.5122795 sigma_e | 1.8724047 rho | .39479255 (fraction of variance due to u_i) ------------------------------------------------------------------------------ .
As categories 2/3 are not statistically significant, I would like to combine them. The results remain the same. As in is there a difference between categories 2 and 3? I am not sure if I am phrasing my question well..
Sunday, January 30, 2022
issues with global macro windows code on mac
I am trying to use a Master do file that someone has sent me from their windows computer. I have a Mac and the project that was sent to me was based on a global macro, used to change to and from directories.
The global MP is meant to be the master path for the project. To run the master management file, I have to change to the master management folder.
I suspect that this is because there's a slight software difference between windows and Mac, any ideas what the problem might be?
Code:
gl MP "/Users/Cassie/Desktop/data/Jared_Data" cd "$MP/Replication_Files/Management/Master" unable to change to /Users/Cassie/Desktop/data/Jared_Data/Replication_Files/Management/Ma > ster r(170); end of do-file
I suspect that this is because there's a slight software difference between windows and Mac, any ideas what the problem might be?
Error: cannot compute an improvement -- discontinuous region encountered
Good night,
I am trying to perform a probit ordered model with panel data and the following error appears: cannot compute an improvement -- discontinuous region encountered
Do anybody know how to solve this error and what does it means?
Thanks!
Laura
I am trying to perform a probit ordered model with panel data and the following error appears: cannot compute an improvement -- discontinuous region encountered
Do anybody know how to solve this error and what does it means?
Thanks!
Laura
Saturday, January 29, 2022
Survival analysis - data management query
Dear all,
I have two variables in my dataset namely "date of birth" "date of death" and would like to do survival analysis. First of all, I am not a statistician so bear with me please.
1st variable codebook:
2nd variable codebook:
I would like to make basic graphs with 95% CI and regression analysis table later on but I do not know how to make it happen. Any tips to convert the dates to practical variables (e.g., age in months or years) would be much appreciated. Also, I know how to run a multivariate regression based on OR but with survival data I am not sure which command should I go for.
Thanks in advance
I have two variables in my dataset namely "date of birth" "date of death" and would like to do survival analysis. First of all, I am not a statistician so bear with me please.
1st variable codebook:
Code:
Type: Numeric daily date (int) Range: [13880,21182] Units: 1 Or equivalently: [01jan1998,29dec2017] Units: days Unique values: 1,062 Missing .: 0/1,174 Mean: 17165.3 = 30dec2006(+ 7 hours) Std. dev.: 2083 Percentiles: 10% 25% 50% 75% 90% 14408 15411 17030.5 18864 20241 13jun1999 12mar2002 17aug2006 25aug2011 02jun2015
2nd variable codebook:
Code:
Type: Numeric daily date (int) Range: [13894,21239] Units: 1 Or equivalently: [15jan1998,24feb2018] Units: days Unique values: 180 Missing .: 994/1,174 Mean: 17588.6 = 26feb2008(+ 13 hours) Std. dev.: 2006.76 Percentiles: 10% 25% 50% 75% 90% 14964.5 15987.5 17325.5 19290 20459.5 20dec2000 09oct2003 08jun2007 24oct2012 06jan2016
Thanks in advance
Stock volatility
Hi,
I need to calculate stock volatility using CRSP daily stock returns. According to the article, it is calculated using the square root of the sum of squared daily returns over the year. To adjust for differences in the number of trading days, the raw sum is multiplied by 252 and divided by the number of trading days. I have the variable "returns". Does anyone which code to use for this?
Another thing: the CRSP data was too big to download at once, so I need to merge. However, when I try to merge, it gives the error: factor-variable and time-series operators not allowed. What should I do?
Thanks in advance!
I need to calculate stock volatility using CRSP daily stock returns. According to the article, it is calculated using the square root of the sum of squared daily returns over the year. To adjust for differences in the number of trading days, the raw sum is multiplied by 252 and divided by the number of trading days. I have the variable "returns". Does anyone which code to use for this?
Another thing: the CRSP data was too big to download at once, so I need to merge. However, when I try to merge, it gives the error: factor-variable and time-series operators not allowed. What should I do?
Thanks in advance!
Regression with Interaction Terms and Post-Estimation Interpretation
Dear Statalist community,
I am running the following code to initially generate a regression output:
This code generates the following with control variables and year output omitted for simplicity:
Then, i run some post-estimation commands to generate an interaction plot.
margins, at(b=(0(1)1) a =(0 1))
Predictive margins Number of obs = 11,363
Model VCE: Conventional
Expression: Linear prediction, predict()
1._at: a = 0
b = 0
2._at: a = 0
b = 1
3._at: a = 1
b = 0
4._at: a = 1
b = 1
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 9.605668 .028179 340.88 0.000 9.550438 9.660898
2 | 9.493659 .1150657 82.51 0.000 9.268134 9.719184
3 | 8.723643 1.379275 6.32 0.000 6.020313 11.42697
4 | 14.65425 2.704182 5.42 0.000 9.354149 19.95435
------------------------------------------------------------------------------
marginsplot
Above margins code generates the following marginsplot
Please see attached for graph
I have the following questions!
1. The regression output shows coefficient of 13.749. In the margins command output, I was expecting the 13.749 to match where a =0 and b = 0 where the main standalone variables are zero. However, this does not seem to be true also in the marginsplot as well. Would this be because i am including control variables as well as fixed effects?
2. Based on the regression output, the effect of a depends on the value(s) b but not by themselves. I am having a little bit of trouble understanding the economic magnitude of the coefficients. Can I add the interaction coefficient to the intercept as the total effect?
3. If the interpretation in #2 is correct, then can I graph this out somehow where I can show the interaction term's 6.04 coefficient in the graph? In other words, if the intercept of 13.749 "increases" by 6.04 (interaction term coefficient), can this increase be visualized in Stata?
Thank you so much,
I am running the following code to initially generate a regression output:
Code:
xtreg y c.a##c.b $controlvariables i.fyear, fe
Code:
Fixed-effects (within) regression Number of obs = 11,363 Group variable: gvkey Number of groups = 1,547 R-squared: Obs per group: Within = 0.4145 min = 1 Between = 0.3950 avg = 7.3 Overall = 0.4009 max = 12 F(24,9792) = 288.80 corr(u_i, Xb) = -0.4448 Prob > F = 0.0000 ------------------------------------------------------------------------------------- y | Coefficient Std. err. t P>|t| [95% conf. interval] --------------------+---------------------------------------------------------------- a | -.8820247 1.37278 -0.64 0.521 -3.572956 1.808907 b | -.1120089 .110937 -1.01 0.313 -.3294683 .1054504 | c.a#c.b | 6.042615 2.489008 2.43 0.015 1.163646 10.92158 _cons | 13.749 1.191463 11.54 0.000 11.41348 16.08451 --------------------+---------------------------------------------------------------- sigma_u | 4.8700768 sigma_e | 2.9240485 rho | .73502736 (fraction of variance due to u_i) ------------------------------------------------------------------------------------- F test that all u_i=0: F(1546, 9792) = 2.86 Prob > F = 0.0000
Then, i run some post-estimation commands to generate an interaction plot.
margins, at(b=(0(1)1) a =(0 1))
Predictive margins Number of obs = 11,363
Model VCE: Conventional
Expression: Linear prediction, predict()
1._at: a = 0
b = 0
2._at: a = 0
b = 1
3._at: a = 1
b = 0
4._at: a = 1
b = 1
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 9.605668 .028179 340.88 0.000 9.550438 9.660898
2 | 9.493659 .1150657 82.51 0.000 9.268134 9.719184
3 | 8.723643 1.379275 6.32 0.000 6.020313 11.42697
4 | 14.65425 2.704182 5.42 0.000 9.354149 19.95435
------------------------------------------------------------------------------
marginsplot
Please see attached for graph
I have the following questions!
1. The regression output shows coefficient of 13.749. In the margins command output, I was expecting the 13.749 to match where a =0 and b = 0 where the main standalone variables are zero. However, this does not seem to be true also in the marginsplot as well. Would this be because i am including control variables as well as fixed effects?
2. Based on the regression output, the effect of a depends on the value(s) b but not by themselves. I am having a little bit of trouble understanding the economic magnitude of the coefficients. Can I add the interaction coefficient to the intercept as the total effect?
3. If the interpretation in #2 is correct, then can I graph this out somehow where I can show the interaction term's 6.04 coefficient in the graph? In other words, if the intercept of 13.749 "increases" by 6.04 (interaction term coefficient), can this increase be visualized in Stata?
Thank you so much,
Splitting An Instance into two existing attributes
I have a dataset that includes two variables called "NAME" and "TITLE".
NAME should simply be an individual's birth name (e.g. "John William Figueroa") and title should be anything appended to the end (e.g. OBE, MD, PhD, JD). Trouble is, a lot of entries instead have this information in the NAME column so that it reads "John William Figueroa, PhD".
Is there an easy way to use the comma (very frequently present) to shift the title into the next column? I'd use the "split" function but I don't want this broken into two new variables, just want to shift some of the information one line over. Thanks so much for your time!
Best,
Chuck
NAME should simply be an individual's birth name (e.g. "John William Figueroa") and title should be anything appended to the end (e.g. OBE, MD, PhD, JD). Trouble is, a lot of entries instead have this information in the NAME column so that it reads "John William Figueroa, PhD".
Is there an easy way to use the comma (very frequently present) to shift the title into the next column? I'd use the "split" function but I don't want this broken into two new variables, just want to shift some of the information one line over. Thanks so much for your time!
Best,
Chuck
Specific period Buy and Hold returns for every day
Hello everyone,
I have daily panel data with prices and returns for several companies.
My question is, how do I calculate Buy and Hold returns for every company and every day in the sample.
E.g. 1-year holding period returns for ARCHER-DANIELS-MIDLAND on 15th March 2020 (until 14th March 2021), 16th March 2020 (until 15th March 2021), ...
Please keep in mind that in the data only trading days are included.
I have looked for a solution for several hours but haven't found anything that exactly matches my problem. I think using asrol and generating an identifying variable could work out. Indeed for the first day of the year and a yearly variable this works out, however, already for the second trading day of the year, one observation is missing. In addition, I thought about generating an identifying dummy variable for every date, however, this is probably not the best solution, since my dataset is really large (c. 15 million observations).
Best regards,
Jakob Stoll
I have daily panel data with prices and returns for several companies.
My question is, how do I calculate Buy and Hold returns for every company and every day in the sample.
E.g. 1-year holding period returns for ARCHER-DANIELS-MIDLAND on 15th March 2020 (until 14th March 2021), 16th March 2020 (until 15th March 2021), ...
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double PERMNO str58 CONM long caldt double prccd float RETURN 10516 "ARCHER-DANIELS-MIDLAND CO" 17136 34.83 . 10516 "ARCHER-DANIELS-MIDLAND CO" 17139 34.85 .0005741212 10516 "ARCHER-DANIELS-MIDLAND CO" 17140 34.58 -.007747393 10516 "ARCHER-DANIELS-MIDLAND CO" 17141 34.3 -.008097241 10516 "ARCHER-DANIELS-MIDLAND CO" 17142 33.76 -.015743468 10516 "ARCHER-DANIELS-MIDLAND CO" 17143 34.07 .009182505 10516 "ARCHER-DANIELS-MIDLAND CO" 17146 33.69 -.01115354 10516 "ARCHER-DANIELS-MIDLAND CO" 17147 33.47 -.006530051 10516 "ARCHER-DANIELS-MIDLAND CO" 17148 32.76 -.02121311 10516 "ARCHER-DANIELS-MIDLAND CO" 17149 32.62 -.004273486 10516 "ARCHER-DANIELS-MIDLAND CO" 17150 32.72 .003065674 10516 "ARCHER-DANIELS-MIDLAND CO" 17153 32.06 -.020171143 end format %td caldt
I have looked for a solution for several hours but haven't found anything that exactly matches my problem. I think using asrol and generating an identifying variable could work out. Indeed for the first day of the year and a yearly variable this works out, however, already for the second trading day of the year, one observation is missing. In addition, I thought about generating an identifying dummy variable for every date, however, this is probably not the best solution, since my dataset is really large (c. 15 million observations).
Code:
gen fyear1 = year(caldt) bysort PERMNO fyear1 : asrol RETURN, stat(product) add(1)
Jakob Stoll
Friday, January 28, 2022
Getting regression results for my last dummy
Hello everyone,
I have 3 dummy variables where I have created 2 variables one is called low_dev and the other one is called high_dev. Now I was wondering how I could display the third dummy which is not created because if both low_dev==0 and high_dev==0 then the constant becomes the last dummy if I remember it correctly?
Regression command for low dev
I have 3 dummy variables where I have created 2 variables one is called low_dev and the other one is called high_dev. Now I was wondering how I could display the third dummy which is not created because if both low_dev==0 and high_dev==0 then the constant becomes the last dummy if I remember it correctly?
Code:
gen Built_L= Built_area<18.27 gen Built_H= Built_area>25.981 gen Built_M= 1 - Built_L - Built_H gen Agri_L= Agri_area<48.90 gen Agri_H= Agri_area>54.75 gen Agri_M= 1-Agri_L-Agri_H gen Forest_L= NaturalForest_area<7.3 gen Forest_H= NaturalForest_area>15.82 gen Forest_M= 1-Forest_H-Forest_L gen low_dev = Built_L & Agri_H & Forest_H gen high_dev = Built_H & Agri_L & Forest_L
Code:
xtreg log_realHP log_PopulationDensity log_Population Unemployment_rate log_real_consCost logReal_income real_interest i.low_dev i.Year,fe vce(robust)
regression coefficient calculation not correct
Hello everyone,
My coefficient for my Construction costs index is too high I have correct construction costs index by dividing it by the cpi index and multiplied by 100 to get the real construction cost variable afterwards I have taken the log of real construction cost to get log real construction cost index. However when I run my regression I get a very high coefficient and I was wondering how I can solve this problem?
This is my regression command and results:
My coefficient for my Construction costs index is too high I have correct construction costs index by dividing it by the cpi index and multiplied by 100 to get the real construction cost variable afterwards I have taken the log of real construction cost to get log real construction cost index. However when I run my regression I get a very high coefficient and I was wondering how I can solve this problem?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double ConsCost_index float(real_consCost log_real_consCost) 100 100 4.6051702 105.2 102.73438 4.632147 109.3 102.53437 4.630198 111.5 101.25672 4.617659 113.7 101.13087 4.6164155 115.9 101.76472 4.6226635 119.3 102.99907 4.63472 124.1 105.97746 4.6632266 129.8 109.09948 4.6922603 130 106.60252 4.669107 130.8 105.9867 4.6633134 133.4 106.70628 4.67008 135.7 106.10562 4.664435 136 103.74653 4.6419506 137.2 102.1092 4.626043 139.8 103.01408 4.6348658 142.6 104.4506 4.648714 145.9 106.54813 4.6685967 149.6 107.74178 4.6797376 153.8 108.91506 4.6905684 157.2 108.50176 4.6867666 100 100 4.6051702 105.2 102.73438 4.632147 109.3 102.53437 4.630198 111.5 101.25672 4.617659 113.7 101.13087 4.6164155 115.9 101.76472 4.6226635 119.3 102.99907 4.63472 124.1 105.97746 4.6632266 129.8 109.09948 4.6922603 130 106.60252 4.669107 130.8 105.9867 4.6633134 133.4 106.70628 4.67008 135.7 106.10562 4.664435 136 103.74653 4.6419506 137.2 102.1092 4.626043 139.8 103.01408 4.6348658 142.6 104.4506 4.648714 145.9 106.54813 4.6685967 149.6 107.74178 4.6797376 153.8 108.91506 4.6905684 157.2 108.50176 4.6867666 100 100 4.6051702 105.2 102.73438 4.632147 109.3 102.53437 4.630198 111.5 101.25672 4.617659 113.7 101.13087 4.6164155 115.9 101.76472 4.6226635 119.3 102.99907 4.63472 124.1 105.97746 4.6632266 129.8 109.09948 4.6922603 130 106.60252 4.669107 130.8 105.9867 4.6633134 133.4 106.70628 4.67008 135.7 106.10562 4.664435 136 103.74653 4.6419506 137.2 102.1092 4.626043 139.8 103.01408 4.6348658 142.6 104.4506 4.648714 145.9 106.54813 4.6685967 149.6 107.74178 4.6797376 153.8 108.91506 4.6905684 157.2 108.50176 4.6867666 100 100 4.6051702 105.2 102.73438 4.632147 109.3 102.53437 4.630198 111.5 101.25672 4.617659 113.7 101.13087 4.6164155 115.9 101.76472 4.6226635 119.3 102.99907 4.63472 124.1 105.97746 4.6632266 129.8 109.09948 4.6922603 130 106.60252 4.669107 130.8 105.9867 4.6633134 133.4 106.70628 4.67008 135.7 106.10562 4.664435 136 103.74653 4.6419506 137.2 102.1092 4.626043 139.8 103.01408 4.6348658 142.6 104.4506 4.648714 145.9 106.54813 4.6685967 149.6 107.74178 4.6797376 153.8 108.91506 4.6905684 157.2 108.50176 4.6867666 100 100 4.6051702 105.2 102.73438 4.632147 109.3 102.53437 4.630198 111.5 101.25672 4.617659 113.7 101.13087 4.6164155 115.9 101.76472 4.6226635 119.3 102.99907 4.63472 124.1 105.97746 4.6632266 129.8 109.09948 4.6922603 130 106.60252 4.669107 130.8 105.9867 4.6633134 133.4 106.70628 4.67008 135.7 106.10562 4.664435 136 103.74653 4.6419506 137.2 102.1092 4.626043 139.8 103.01408 4.6348658 end
This is my regression command and results:
Code:
. asdoc xtreg log_realHP log_PopulationDensity log_Population Unemployment_rate log_ > real_consCost logReal_income real_interest i.low_dev i.Year,fe vce(robust) replace > cnames(low development) save(PanelData_regression) add(Low dev Dummy,YES, Year Du > mmy,YES) dec(3) note: 2018.Year omitted because of collinearity note: 2019.Year omitted because of collinearity Fixed-effects (within) regression Number of obs = 2,532 Group variable: GM_code Number of groups = 282 R-sq: Obs per group: within = 0.8875 min = 8 between = 0.0581 avg = 9.0 overall = 0.0023 max = 9 F(13,281) = 946.78 corr(u_i, Xb) = -0.5914 Prob > F = 0.0000 (Std. Err. adjusted for 282 clusters in GM_code) ----------------------------------------------------------------------------------- | Robust log_realHP | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------+---------------------------------------------------------------- log_PopulationD~y | .0364162 .0302895 1.20 0.230 -.0232068 .0960393 log_Population | .1433754 .076217 1.88 0.061 -.0066533 .2934041 Unemployment_rate | .03056 .0082342 3.71 0.000 .0143513 .0467686 log_real_consCost | 78.75831 1.275555 61.74 0.000 76.24746 81.26917 logReal_income | .2192286 .1029772 2.13 0.034 .0165239 .4219334 real_interest | .5209448 .0086732 60.06 0.000 .5038721 .5380175 1.low_dev | .0109145 .0026326 4.15 0.000 .0057325 .0160966 | Year | 2012 | 1.041003 .0177587 58.62 0.000 1.006046 1.07596 2013 | 2.727895 .0451684 60.39 0.000 2.638983 2.816806 2014 | 3.375704 .0558562 60.44 0.000 3.265754 3.485653 2015 | 2.838823 .0477437 59.46 0.000 2.744843 2.932804 2016 | 1.79892 .0302964 59.38 0.000 1.739283 1.858557 2017 | .7132875 .0122971 58.00 0.000 .6890815 .7374935 2018 | 0 (omitted) 2019 | 0 (omitted) | _cons | -359.9836 6.54667 -54.99 0.000 -372.8704 -347.0969 ------------------+---------------------------------------------------------------- sigma_u | .29981225 sigma_e | .02925436 rho | .99056879 (fraction of variance due to u_i) ----------------------------------------------------------------------------------
Latitude and Longitude to perform geonear and the case of 90 to -90
To perform geo near, we require latitude and longitude. For this do we use the coordinates information from the shapefile or create the centroids separately using the centroid command.? Over here I have used the latitudes and longitudes by using the centroid command.
Second, I am trying to perform geo near. I want to look at the border districts' effect (spillover effect) i.e I want to find the non-treated neighboring districts for my treated districts. However, the treatment of the district varies over time. In other words, districts come in for treatment and go out over time. Now, I have got two datasets with the centroid values - one includes only treated districts and the other non-treated districts. After which I perform the geo near using the following codes with 2000 km?
geonear district x_stub y_stub using "Non- treated districts with centroids.dta", n(district1 x_stub1 y_stub1) ign long within(2000) near(2)
However, I am getting the following syntax.
nbor latitude var x_stub1 must be between -90 and 90
r(198);
Can anybody please help with these concerns?
Second, I am trying to perform geo near. I want to look at the border districts' effect (spillover effect) i.e I want to find the non-treated neighboring districts for my treated districts. However, the treatment of the district varies over time. In other words, districts come in for treatment and go out over time. Now, I have got two datasets with the centroid values - one includes only treated districts and the other non-treated districts. After which I perform the geo near using the following codes with 2000 km?
geonear district x_stub y_stub using "Non- treated districts with centroids.dta", n(district1 x_stub1 y_stub1) ign long within(2000) near(2)
However, I am getting the following syntax.
nbor latitude var x_stub1 must be between -90 and 90
r(198);
Can anybody please help with these concerns?
(ssc) esttab and stcrreg: option equations() invalid
For some reason esttab (an ssc package) will not let me combine these two outputs:
Anyone have an idea of what is going wrong? I tried specifying "equations()" to try to clear out whatever it dislikes, to no avail. Thanks!!
Code:
. stcrreg contrib vx1adultHazRtHat, compete(dosedByNow==2) [ . . . ] Competing-risks regression No. of obs = 5,482 No. of subjects = 539 Failure event: dosedByNow == 1 No. failed = 194 Competing event: dosedByNow == 2 No. competing = 37 No. censored = 308 Wald chi2(2) = 5.88 Log pseudolikelihood = -1184.7533 Prob > chi2 = 0.0528 (Std. err. adjusted for 539 clusters in subject_id) ---------------------------------------------------------------------------------- | Robust _t | SHR std. err. z P>|z| [95% conf. interval] -----------------+---------------------------------------------------------------- contrib | 1.125542 .0549522 2.42 0.015 1.022831 1.238568 vx1adultHazRtHat | 1.024642 .043759 0.57 0.569 .942367 1.1141 ---------------------------------------------------------------------------------- . eststo e1, title("cr") . stcrreg contrib vx1adultHazRtHat, compete(dosedByNow==2) tvc(contrib vx1adultHazRtHat) [ . . . ] Competing-risks regression No. of obs = 5,482 No. of subjects = 539 Failure event: dosedByNow == 1 No. failed = 194 Competing event: dosedByNow == 2 No. competing = 37 No. censored = 308 Wald chi2(4) = 6.33 Log pseudolikelihood = -1184.4078 Prob > chi2 = 0.1759 (Std. err. adjusted for 539 clusters in subject_id) ---------------------------------------------------------------------------------- | Robust _t | SHR std. err. z P>|z| [95% conf. interval] -----------------+---------------------------------------------------------------- main | contrib | 1.167825 .0900622 2.01 0.044 1.003999 1.358382 vx1adultHazRtHat | 1.058385 .0715551 0.84 0.401 .9270347 1.208347 -----------------+---------------------------------------------------------------- tvc | contrib | .9991668 .0012241 -0.68 0.496 .9967704 1.001569 vx1adultHazRtHat | .9991421 .0013809 -0.62 0.535 .9964393 1.001852 ---------------------------------------------------------------------------------- Note: Variables in tvc equation interacted with _t. . eststo e2, title("tvc") . esttab e1 ---------------------------- (1) _t ---------------------------- eq1 contrib 0.118* (2.42) vx1adultHa~t 0.0243 (0.57) ---------------------------- N 5482 ---------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 . esttab e2 ---------------------------- (1) _t ---------------------------- main contrib 0.155* (2.01) vx1adultHa~t 0.0567 (0.84) ---------------------------- tvc contrib -0.000834 (-0.68) vx1adultHa~t -0.000858 (-0.62) ---------------------------- N 5482 ---------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 . esttab e1 e2 option equations() invalid specified equation name already occurs in model 2 r(198);
lazy man's research
I have some info culled from international registries of non-US rehab medicine clinical trials
one of the questions of interest is whether a trial included subjects over a certain age (e.g., 65 or 85); note that I do NOT have info on the inclusion/exclusion criteria
for most trials I have the total N, but not for all trials
for some trials I have the mean and SD (though for some I only have the mean) or I have info that can be used to estimate these values (I am using the formulae suggested in Wan, X, et al. (2014), "Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range", BMC Medical Research Methodology, 14: 135)
If I have the N, the mean and the SD AND if I can assume that the ages are approximately normally distributed, then answering the over65 or over85 question is easy. However, I am uncomfortable making this assumption and I am looking for citations that provide guidance for either (1) different distributions (esp skewed ones) and/or (2) truncated distributions but where the truncation point is unknown. If I could find such cites, I could do a sensitivity analysis re: the assumed normal distribution answer
So, does any one know of any such citations or have other suggestions?
By the way, I know that I could do a bunch of simulations to get there but I think this would be more expensive then my clients want to go.
one of the questions of interest is whether a trial included subjects over a certain age (e.g., 65 or 85); note that I do NOT have info on the inclusion/exclusion criteria
for most trials I have the total N, but not for all trials
for some trials I have the mean and SD (though for some I only have the mean) or I have info that can be used to estimate these values (I am using the formulae suggested in Wan, X, et al. (2014), "Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range", BMC Medical Research Methodology, 14: 135)
If I have the N, the mean and the SD AND if I can assume that the ages are approximately normally distributed, then answering the over65 or over85 question is easy. However, I am uncomfortable making this assumption and I am looking for citations that provide guidance for either (1) different distributions (esp skewed ones) and/or (2) truncated distributions but where the truncation point is unknown. If I could find such cites, I could do a sensitivity analysis re: the assumed normal distribution answer
So, does any one know of any such citations or have other suggestions?
By the way, I know that I could do a bunch of simulations to get there but I think this would be more expensive then my clients want to go.
Thursday, January 27, 2022
spmap polygon sepearate graphs
Dear all,
I'm trying to overlay EU NUTS2 and NUTS3 using spmap, but the polygon option doesn't combine two graphs and is seperating them. Could anyone suggest what to do, please? Many thanks.
use "NUTS_RG_20M_2021_3035.dta", clear /// NUTS3
merge 1:1 NUTS3 using "sector_data_by_NUTS3.dta" /// merge with sector level data by NUTS3
keep if _m == 3
drop _m
* Polygon
spmap green_share_emp using "NUTS_RG_20M_2021_3035_shp.dta", id(_ID) fcolor(Blues2) clnumber(4) polygon(data("nuts2coord_Germany.dta") ocolor(black))
Array
I'm trying to overlay EU NUTS2 and NUTS3 using spmap, but the polygon option doesn't combine two graphs and is seperating them. Could anyone suggest what to do, please? Many thanks.
use "NUTS_RG_20M_2021_3035.dta", clear /// NUTS3
merge 1:1 NUTS3 using "sector_data_by_NUTS3.dta" /// merge with sector level data by NUTS3
keep if _m == 3
drop _m
* Polygon
spmap green_share_emp using "NUTS_RG_20M_2021_3035_shp.dta", id(_ID) fcolor(Blues2) clnumber(4) polygon(data("nuts2coord_Germany.dta") ocolor(black))
Array
Obtain which bin and corresponding density each observation in plotted histogram belongs to?
Hello everyone,
Suppose I plot a histogram:
Given the plotted histogram, I would like to generate two new variables:
I have tried using the command twoway__histogram_gen to create a solution. However, while my solution works in the above case it does not seem to work for example when:
Checking so this has given the correct solution:
Finally, note that by browsing the data we see rounding already becoming a slight problem since we have x_1 == .99999994 instead of x_1 == 1.
Thanks in advance for help!
Simon
aI'm not sure that the histogram bins of Stata are indeed of the form [x_v, x_{v+1}) (i.e., closed-open) but my investigations indicate this is true (the final bin having endpoint infinity).
Suppose I plot a histogram:
Code:
clear set obs 10 g z = _n replace z = 5 if _n > 5 hist z
- bin, giving the bin which a given observation belongs to.
- density, giving the density of the bin which the observation belongs to.
Code:
g correct_bin = 1 if inrange(_n, 1, 2) replace correct_bin = 2 if _n == 3 replace correct_bin = 3 if _n >= 4 g correct_density = 0.15 if inrange(_n, 1, 2) replace correct_density = 0.075 if _n == 3 replace correct_density = 0.525 if _n >= 4
- Bins aren't just beside each other, that is for example bin 1 = [1,2) and bin 2 = [5,6)
- Or even just when the sample size grows and the values of the z variable is continuous, then numerical issues quickly arise
- Use twoway__histogram_gen to find the midpoint of each bin.
- Adjust the midpoints to be the start points of the bins.
- Create new variables x_v which are constant to the start point of bin v.
- Check which interval [x_v, x_{v+1}) each observation belongs to.a
- Find the corresponding density of that bin.
Code:
* 1, finding midpoints twoway__histogram_gen z, gen(y x) * 2, adjusting midpoints to start points local adjust = (x[2] - x[1]) / 2 replace x = x - `adjust' * 3, generating variables constant to startpoints count if x != . local N = r(N) forvalues v = 1/`=`N'+1' { g x_`v' = x[`v'] } * 4, finding bin of each observation g new_bin = . forvalues v = 1/`N' { replace new_bin = `v' if x_`v' <= z & z < x_`=`v'+1' } * 5, finding density of the bin g new_density = . forvalues v = 1/`N' { replace new_density = y[`v'] if new_bin == `v' }
Code:
assert correct_bin == new_bin assert correct_density == new_density
Thanks in advance for help!
Simon
aI'm not sure that the histogram bins of Stata are indeed of the form [x_v, x_{v+1}) (i.e., closed-open) but my investigations indicate this is true (the final bin having endpoint infinity).
Same date or within 7 days, within two variables
Dear statalist-members,
I need to identify patients that got med1 and med2 on the same day, or within 7 days. I made a variable that identified if a patient got two or more prescriptions on the same date:
duplicates tag patID date, gen (sameday)
But that new variable also gave me some results of patients who received two types of med1, not both med1 and med2, as I am interested in.
Thank you, Cathrine
I need to identify patients that got med1 and med2 on the same day, or within 7 days. I made a variable that identified if a patient got two or more prescriptions on the same date:
duplicates tag patID date, gen (sameday)
But that new variable also gave me some results of patients who received two types of med1, not both med1 and med2, as I am interested in.
Code:
|
|||
* Example generated by -dataex-. To install: ssc install dataex | |||
clear | |||
input long patID float(date med1 med2) byte sameday | |||
76632 19493 1 . 0 | |||
76779 21626 1 . 1 | |||
76779 21626 . 1 1 | |||
76779 21781 1 . 0 | |||
76779 21896 1 . 0 | |||
76804 21509 . 1 0 | |||
76806 20612 . 1 0 | |||
76806 20685 . 1 0 | |||
76837 19620 1 . 1 | |||
76837 19620 1 . 1 | |||
76837 19709 1 . 2 | |||
76837 19709 1 . 2 | |||
76837 19709 . 1 2 | |||
76911 20368 1 . 0 | |||
76911 20403 . 1 0 | |||
76911 20458 . 1 0 | |||
76911 20511 . 1 0 | |||
76911 20581 . 1 0 | |||
77072 19852 1 . 0 | |||
77072 19950 1 . 0 | |||
77072 20485 . 1 1 | |||
77072 20485 1 . 1 | |||
77072 20544 . 1 0 | |||
77072 20850 1 . 0 | |||
77072 21565 . 1 1 | |||
77072 21565 1 . 1 | |||
77072 21595 . 1 0 | |||
77072 21862 . 1 0 | |||
77072 21875 . 1 0 | |||
77601 21867 1 . 1 | |||
77601 21867 1 . 1 | |||
77636 19463 1 . 1 | |||
77636 19463 1 . 1 | |||
77636 19600 1 . 1 | |||
77636 19600 1 . 1 | |||
77636 19884 1 . 0 | |||
77636 19977 1 . 0 | |||
77636 20123 1 . 0 | |||
77636 20301 1 . 0 | |||
77636 20593 1 . 0 | |||
77636 20678 1 . 0 | |||
77745 21265 . 1 0 | |||
77745 21803 . 1 0 | |||
77755 20982 1 . 0 | |||
end | |||
format %tdCCYY-NN-DD date | |||
Simple percentiles using wide dataset
Hello,
I'm not sure why I am having such trouble creating a variable corresponding to percentiles. I have a dataset that looks like the dataex example below with a person ID (pid) and a number (adheresum). I am trying to assign a percentile to each ID based on the value of adheresum. I tried pctile pct=adheresum but this just ended up with all missing values for all but ID=1. What am I missing here?
Thank you very much!
Sarah
I'm not sure why I am having such trouble creating a variable corresponding to percentiles. I have a dataset that looks like the dataex example below with a person ID (pid) and a number (adheresum). I am trying to assign a percentile to each ID based on the value of adheresum. I tried pctile pct=adheresum but this just ended up with all missing values for all but ID=1. What am I missing here?
Thank you very much!
Sarah
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(pid adheresum) 1 120 2 28 3 44 4 13 5 112 6 145 7 4 8 84 9 68 10 143 11 31 12 4 13 164 14 46 15 136 16 44 17 35 18 15 19 87 20 140 21 157 22 158 23 5 24 162 25 88 26 18 27 93 28 45 29 11 30 90 31 12 32 177 33 81 34 107 35 105 36 148 37 82 38 124 39 49 40 30 41 85 42 8 43 122 44 78 45 101 46 9 47 69 48 154 49 18 50 56 51 128 52 164 53 34 54 131 55 102 56 127 57 108 58 160 59 133 60 127 61 97 62 119 63 112 64 90 65 48 66 166 67 156 68 123 69 112 70 150 71 34 72 153 73 56 74 164 75 81 76 129 77 95 78 122 79 107 80 148 81 69 82 69 83 48 84 50 85 11 86 67 87 70 88 152 89 51 90 77 91 161 92 171 93 12 94 96 95 111 96 151 97 90 98 96 99 160 100 135 end
Checking Duplications
I am trying to use xtset and I know how to remove any duplications. Is there a way to see if the non-id/time variables also repeat in value? For example, if I have
Could I see if X repeats along with those same observation id and year?
id | year | X |
1 | 1 | 4 |
1 | 1 | 4 |
Matching several observations via new variable
Hello everyone,
Currently I'm working with a dataset that contains several waves with various ID's (mothers) and their corresponding children, who also have their own ID's .
Each ID of the mother is matched in one line with the ID of their children.
However, In the dataset also contains mothers who have more than ohne child, so one ID can be potentially matched with several CIDs (As you can see in the example).
To see how many mothers have more than one child (via the CID) in the Dataset, I need a new variable that contains mothers with multiple children.
This command should say (by sense)
"gen children if there is more than one CID hat belongs to the ID"
Hopefully you got my point and can help me figure this out.
Thanks in advance!
Currently I'm working with a dataset that contains several waves with various ID's (mothers) and their corresponding children, who also have their own ID's .
Each ID of the mother is matched in one line with the ID of their children.
However, In the dataset also contains mothers who have more than ohne child, so one ID can be potentially matched with several CIDs (As you can see in the example).
Wave | ID | CID |
1 | 21001 | 1001 |
1 | 21001 | 1002 |
This command should say (by sense)
"gen children if there is more than one CID hat belongs to the ID"
Hopefully you got my point and can help me figure this out.
Thanks in advance!
Wednesday, January 26, 2022
Getting the repeated-measures bse() error (again)
Hello,
I'm new to the list so apologies for posting a basic question. Until the latest update of STATA, I've been able to figure out how to do a repeated-measures analysis without getting this error:
could not determine between-subject error term; use bse() option
Now I'm getting it again and can't figure out how to change the syntax. I'm pretty sure I've read every post in this forum on the bseunit/bse error, as well the user manual, the help menus, and whatever I can find online, and am still stuck.
It's an experiment where each person saw 4 types of stimuli (the category variable), and each type had a number of items. The dataset is too large to paste here, so below are the first two participants. There are different numbers of items in each category, but I didn't think that would matter.
Our research question is whether there is an effect of category. We don't need effects of items, but I put them in the model.
I initially thought it would be:
anova score category item, repeated(category)
but that generates the error. Ditto with adding a bse or bseunit.
Thanks in advance for any advice!
I'm new to the list so apologies for posting a basic question. Until the latest update of STATA, I've been able to figure out how to do a repeated-measures analysis without getting this error:
could not determine between-subject error term; use bse() option
Now I'm getting it again and can't figure out how to change the syntax. I'm pretty sure I've read every post in this forum on the bseunit/bse error, as well the user manual, the help menus, and whatever I can find online, and am still stuck.
It's an experiment where each person saw 4 types of stimuli (the category variable), and each type had a number of items. The dataset is too large to paste here, so below are the first two participants. There are different numbers of items in each category, but I didn't think that would matter.
Our research question is whether there is an effect of category. We don't need effects of items, but I put them in the model.
I initially thought it would be:
anova score category item, repeated(category)
but that generates the error. Ditto with adding a bse or bseunit.
Thanks in advance for any advice!
Participant | Category | Item | Difference |
1 | 1 | 1 | 0 |
1 | 1 | 2 | 1 |
1 | 1 | 3 | 2 |
1 | 1 | 4 | 2 |
1 | 1 | 5 | 1 |
1 | 1 | 6 | 2 |
1 | 1 | 7 | 1 |
1 | 2 | 1 | 0 |
1 | 2 | 2 | 3 |
1 | 2 | 3 | 2 |
1 | 2 | 4 | 2 |
1 | 2 | 5 | 1 |
1 | 2 | 6 | 0 |
1 | 2 | 7 | 0 |
1 | 2 | 8 | 2 |
1 | 2 | 9 | 2 |
1 | 3 | 1 | 0 |
1 | 3 | 2 | 2 |
1 | 3 | 3 | 2 |
1 | 3 | 4 | 2 |
1 | 3 | 5 | 1 |
1 | 3 | 6 | 1 |
1 | 3 | 7 | 2 |
1 | 3 | 8 | 2 |
1 | 3 | 9 | 1 |
1 | 3 | 10 | 2 |
1 | 4 | 1 | 2 |
1 | 4 | 2 | 2 |
1 | 4 | 3 | 2 |
1 | 4 | 4 | 1 |
1 | 4 | 5 | 1 |
1 | 4 | 6 | 1 |
1 | 4 | 7 | 1 |
1 | 4 | 8 | 2 |
1 | 4 | 9 | 2 |
1 | 4 | 10 | 2 |
1 | 5 | 1 | 2 |
1 | 5 | 2 | 1 |
1 | 5 | 3 | 1 |
1 | 5 | 4 | 1 |
1 | 5 | 5 | 2 |
1 | 5 | 6 | 2 |
1 | 5 | 7 | 1 |
1 | 5 | 8 | 1 |
1 | 5 | 9 | 2 |
1 | 5 | 10 | 2 |
2 | 1 | 1 | 1 |
2 | 1 | 2 | 3 |
2 | 1 | 3 | 0 |
2 | 1 | 4 | 2 |
2 | 1 | 5 | 1 |
2 | 1 | 6 | 1 |
2 | 1 | 7 | 3 |
2 | 2 | 1 | 3 |
2 | 2 | 2 | 1 |
2 | 2 | 3 | 0 |
2 | 2 | 4 | 1 |
2 | 2 | 5 | 0 |
2 | 2 | 6 | 2 |
2 | 2 | 7 | 1 |
2 | 2 | 8 | 1 |
2 | 2 | 9 | 1 |
2 | 3 | 1 | 2 |
2 | 3 | 2 | 2 |
2 | 3 | 3 | 0 |
2 | 3 | 4 | 2 |
2 | 3 | 5 | 0 |
2 | 3 | 6 | 2 |
2 | 3 | 7 | 2 |
2 | 3 | 8 | 3 |
2 | 3 | 9 | 1 |
2 | 3 | 10 | 1 |
2 | 4 | 1 | 1 |
2 | 4 | 2 | 1 |
2 | 4 | 3 | 1 |
2 | 4 | 4 | 2 |
2 | 4 | 5 | 1 |
2 | 4 | 6 | 1 |
2 | 4 | 7 | 0 |
2 | 4 | 8 | 1 |
2 | 4 | 9 | 2 |
2 | 4 | 10 | 1 |
2 | 5 | 1 | 2 |
2 | 5 | 2 | 2 |
2 | 5 | 3 | 1 |
2 | 5 | 4 | 1 |
2 | 5 | 5 | 2 |
2 | 5 | 6 | 2 |
2 | 5 | 7 | 2 |
2 | 5 | 8 | 1 |
2 | 5 | 9 | 2 |
2 | 5 | 10 | 1 |