BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Thursday, December 31, 2020

How to get the value of the first occurence on the first row

I have a data in this format below. I want to generate another column (X) that gave me the value of the first occurrence of column A. For example, in new column X, it should give me the value of 15 for ID 1 and 32 for ID 2 and 5 for ID 3. How can I get that?
ID | A| Visit
1 . 1
1 15 2
1 14 3
1 . 4
1 18 5
1 . 6
2 . 1
2 . 2
2 32 3
2 . 4
2 23 5
3 . 1
3 5 2

______________________
I want sth like this

ID | A| Visit | X

1 . 1 15

1 15 2 15

1 14 3 15

1 . 4 15

1 18 5 15

1 . 6 15

2 . 1 32

2 . 2 32

2 32 3 32

2 . 4 32

2 23 5 32

3 . 1 5

3 5 2 5

Invalid lval

Hi,

I am new to mata, and I'm replicating an old code that uses it. Unfortunately, I'm getting the errors `invalid lval r(3000)` after the assignment of elements x, x1, estvcov and estmean (lines 3, 4, 5, 6, 7, 12, 14, 15 of the code below). I understand this error appears when the element is not an lval (https://www.stata.com/manuals13/m-2op_assignment.pdf). However, all of the following items should be matrices, and so I do not understand why this error appears. (Note that simvalues1 and simvalues2 are also matrices).

Would highly appreciate any help!
Thanks.

mata
function pctChangeCI(x, x1, estmean, estvcov) {
// x = (st_matrix("simvalues1"), 1)
// x1 = (st_matrix("simvalues2"), 1)

//x = (1, st_matrix("simvalues1"))
//x1 = (1, st_matrix("simvalues2"))

//estvcov = ( 0.0321425679334, -0.0125258708837030, -0.01224257550358, -0.020567673273258, -0.00081447962364744, 0.001283175119054, -0.00037078276968431) \ (-0.0125258708837, 0.0247147324992786, 0.00572957906882, 0.013578287152935, -0.00021238922227112, -0.000095793687137, 0.00000054591255634) \ (-0.0122425755036, 0.0057295790688211, 0.02474244266765, 0.000709502894560, 0.00038112327914381, -0.000473762169358, 0.00003959881093361) \ (-0.0205676732733, 0.0135782871529354, 0.00070950289456, 0.046983991203950, 0.00015659148543411, 0.000004182942966, -0.00007048925426224) \ (-0.0008144796236, -0.0002123892222711, 0.00038112327914, 0.000156591485434, 0.00010085537196102, -0.000047774418586, 0.00000001528810965)\ ( 0.0012831751191, -0.0000957936871374, -0.00047376216936, 0.000004182942966, -0.00004777441858647, 0.000163152376626, -0.00000408242456036)\ (-0.0003707827697, 0.0000005459125563, 0.00003959881093, -0.000070489254262, 0.00000001528810965, -0.000004082424560, 0.00003239995503067)

//estmean = (-2.53549625658, 2.43252537821, 4.12186913122, 1.05335147161, 0.04816370328, -0.06482512498, -0.06319703473)

conflevel = .95
//estmean = st_matrix("e(b)")
//estvcov = st_matrix("e(V)")
probvec = (.5, (1-conflevel)/2, 1-(1-conflevel)/2)

g_theta = (invlogit(rowsum(x1:*estmean)) - invlogit(rowsum(x:*estmean)))/invlogit(rowsum(x:*estmean))

dg_theta = (invlogit(rowsum(x:*estmean)) * (x1 * logisticden(rowsum(x1:*estmean)) - (x * logisticden(rowsum(x:*estmean)))) - (invlogit(rowsum(x1:*estmean)) - invlogit(rowsum(x:*estmean))) * x * logisticden(rowsum(x:*estmean))) / (invlogit(rowsum(x:*estmean))^2)

g_theta_sd = sqrt(dg_theta * estvcov * dg_theta')

probvec = (invnormal(probvec):*g_theta_sd:+g_theta):*100
return(probvec)

}
end

ivreghdfe: how to first-stage regression predicted values

hello, i'm using ivreghdfe, and i'm interested in obtained predicted values of the endogenous variable from the first-stage regression using ivreghdfe

for example,

use http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta, clear
ivreghdfe lwage exper expersq (educ=age kidslt6 kidsge6), first

yields first-stage regression of educ on all exogenous variables, from which it makes a prediction of educ (call it educ_hat) that is then used in the second-stage regression.

i'm curious if it's possible to create a variable educ_hat that is the first-stage regression predicted values.

i'd greatly appreciate your help, and happy new year!!

best,

john

multiple imputation error after MI Estimate logistic regression

Hello:
Working on MI
Everything works up to the MI estimate. See Error below
Have been to the Stata manual and viewed the videos and believe i am reproducing exactly what is recommended but with no luck.

I appreciate any advice/help.

Code:
use opioid_temp, clear
mi set mlong
mi register imputed conf5_refer1_rc ddppq_adequacy_totrev ddppq_legitmacy_totrev ddppq_support_totrev ddppq_esteem_totrev ///
ddppq_satisfaction_totrev /*ddppq_tot_rev*/gender_rc yearssincegrad ocs_or_fell perc_msk perc_takeopioid ///
/*oft_curropioid oft_pastopioid oft_goalopioid oft_redopioid oft_histmisuse*/ ptrole_opiodi_rc hrs_trainmisuse_rc
mi impute chained (regress) ddppq_adequacy_totrev ddppq_legitmacy_totrev ddppq_support_totrev ddppq_esteem_totrev ///
ddppq_satisfaction_totrev /*ddppq_tot_rev*/ yearssincegrad perc_msk perc_takeopioid, add(10) rseed(888)
mi impute chained (logit) conf5_refer1_rc ptrole_opiodi_rc hrs_trainmisuse_rc gender_rc, add(10)
mi xeq 0 1 10: sum ddppq_adequacy_totrev ddppq_legitmacy_totrev ddppq_support_totrev ddppq_esteem_totrev ///
ddppq_satisfaction_totrev yearssincegrad perc_msk perc_takeopioid conf5_refer1_rc ptrole_opiodi_rc hrs_trainmisuse_rc gender_rc
mi estimate : logistic conf5_refer1_rc ddppq_adequacy_totrev ddppq_legitmacy_totrev ddppq_support_totrev ddppq_esteem_totrev ddppq_satisfaction_totrev ///
gender_rc yearssincegrad ocs_or_fell perc_msk perc_takeopioid ptrole_opiodi_rc hrs_trainmisuse_rc

Error:
Imputations (20):
.........10x
estimation sample varies between m=1 and m=11; click here for details

sample data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte conf5_refer1_rc float(ddppq_adequacy_totrev ddppq_legitmacy_totrev ddppq_support_totrev ddppq_esteem_totrev ddppq_satisfaction_totrev) byte gender_rc float(yearssincegrad ocs_or_fell perc_msk perc_takeopioid)
0 27 13  4 19 19 0  3 1 5 3
0 25 11 16 21 17 0  8 1 3 2
1 23  9 13 16 14 1 30 1 5 2
0 24  8  7 15 13 0  6 1 5 2
0 19  5  9 14 15 1  5 0 5 1
1 22  9  7 11 12 1  5 1 5 2
1 25 11 13 21 17 1 19 1 5 2
0 35 13  4 17 17 0 23 1 5 1
1 35 12 14 21 17 0 21 1 1 1
0 22  7  5 13 15 0 41 0 4 1
1 38 10 16 25 20 1 35 0 5 3
1 26 11 16 17 16 1 31 1 2 2
0 16 11  6 17 16 0  4 0 4 4
1 28 11 15 21 14 0 29 0 1 1
1 31 11 12 21 17 0 13 0 1 1
0 30 13 17 18 22 0 15 1 5 3
1 31  9 13 17 20 0  . 0 5 3
1 37 13 11 18 20 0 18 1 1 2
1 15 11  6 15 18 1 37 1 5 3
0 36 11 16 17 11 1  1 0 5 3
1 30 11 19 16 19 1  1 0 5 3
1 35 11 11 21 22 0 53 0 5 1
0 16 10 15 11 13 1  5 1 5 2
1 30 10 19 18 20 1  3 0 2 2
0 25 11  6 23 19 1  4 0 4 4
0 10  9  4 16 13 1 12 0 1 1
0 11 11 12 17 18 1  1 0 5 2
0 31 11 16 21 16 1 25 1 5 2
1 29 13 16 22 14 1  3 1 5 1
1 15 11  . 18 18 1 17 1 1 1
. 22 11  .  .  . 1  3 0 5 2
1 33 11 16 25 19 1 36 0 1 1
1 32 10  9 14 14 0  3 0 5 3
0 29 10 14 18 10 0  1 0 5 3
.  .  .  .  .  . 0 17 0 5 1
0 36 13 19 16 16 0  2 1 5 2
1 26 10 12 16 15 0  7 1 5 2
1 38 13  3 23 23 1 25 0 1 1
1 28 11  7 18 19 1  2 0 4 2
1 32  9 10 20 18 1 13 0 2 4
0 25  7 13 16 25 1  3 0 5 3
0 29  8 12  . 12 1 17 0 2 5
0 12  9  3 22 15 1 20 1 5 1
1 36 11 10 23 20 1 21 0 5 4
0 29 13 10 21 20 1  . 0 5 2
1 40 13 10 23 25 0  3 0 2 4
. 36 13  .  .  . . 38 1 5 1
1 30 11 16 16 16 1  8 0 5 3
1 36 12 16 20 18 1 15 1 1 1
0 22 11  7 19 19 0 20 1 5 3
0 19  8 13 11 11 0  9 0 5 3
1 34  9 13 22 18 1 40 0 5 .
1 28 10 16 17 17 1 26 0 1 1
1 37 11 16 21 21 0 30 0 4 3
1 31 13 19 17 22 1 10 1 5 2
1 33 13 16 21 17 1 42 1 5 2
1 28  9 13 17 17 0  6 0 3 3
1 41 13 18 22 21 1 15 0 1 2
1 16 11 16 22 18 1  1 0 5 2
1 26 13 19 22 11 1 29 0 1 2
0 19 10 11 14 17 1  1 0 5 4
1 16  3 13  8 11 1  1 0 5 3
.  .  .  .  .  . 0 15 1 5 2
1 17 11 19 19 15 0 15 1 5 2
0 28  9 16 18 16 0 16 1 5 3
1 35 13 16 16 17 0 37 0 5 2
0 20 11 14 14 16 1  4 1 2 3
0 25 11 13 18 17 1  3 0 3 4
0 33  8 10 21 19 0  2 0 5 2
1 35 11 13 20 17 1 24 0 5 3
1  .  7 18 18 14 1 20 1 5 1
1 36 12 18 21 22 1 10 0 1 3
1 30 11 10 22 23 0 17 1 1 2
0 21 10  7 18 12 1  4 0 5 3
1 22 13 12 15 14 0  4 1 1 3
1 27  9  8 23 11 1  4 1 4 1
.  .  .  .  .  . 0  8 1 5 1
1 34 11 16 14 21 0  2 0 5 3
0 19 12  8 12  9 1 10 0 5 4
0 22  9  7 13 11 0  7 1 5 3
0 34  9 13 22 21 1  7 0 1 1
0 20  9 16 16 14 1  1 0 5 3
. 23  7 10 14 14 0  2 1 5 4
. 26  9  .  .  . 0 22 1 5 3
0 15 11  7 21 22 0 11 1 1 3
1 26 11 16 17 16 1  3 0 5 2
1 39 12 16 20 18 0 26 1 2 2
0 33  9 11 25 14 1  1 0 5 3
1 28 12 13 11 22 0 16 1 5 2
1 22  4 13 22 16 1 19 1 1 3
1 32 10 16 16 15 1 28 0 3 4
1 30 11  7 11 15 1 37 1 5 3
1 18 11 15 22 15 0 31 0 5 2
0 28 11 16 17 17 1  5 0 5 4
1 12 11 16 17 13 0  2 0 5 4
.  .  .  .  .  . 0 10 1 5 1
0 29 13 13 20 17 0  6 1 5 3
1 34 13 19 21 21 1 31 0 5 1
1 33  9 19 20 17 0 12 1 5 3
1 36 10 17 24 21 1 28 1 5 3
end
label values conf5_refer1_rc confreflab1
label def confreflab1 0 "0 1-4 less confident", modify
label def confreflab1 1 "1 5-7 more confident", modify
label values gender_rc gendlab
label def gendlab 0 "0 male and other", modify
label def gendlab 1 "1 female", modify
label values perc_msk perc_msk_
label def perc_msk_ 1 "0 - 20%", modify
label def perc_msk_ 2 "21- 40%", modify
label def perc_msk_ 3 "41 - 60%", modify
label def perc_msk_ 4 "61 - 80%", modify
label def perc_msk_ 5 "81 - 100%", modify
label values perc_takeopioid perc_takeopioid_
label def perc_takeopioid_ 1 "0 - 20%", modify
label def perc_takeopioid_ 2 "21- 40%", modify
label def perc_takeopioid_ 3 "41 - 60%", modify
label def perc_takeopioid_ 4 "61 - 80%", modify
label def perc_takeopioid_ 5 "81 - 100%", modify

Create for each time period the median of the 350 biggest values

Hello everyone,

I am currently sitting on an issue regarding taking the median of the 350 highest values per period. Basically I have the market equity (variable "me") for each year (variable "monthly_date") for multiple companies (variable "companies"), I want Stata to group all companies within that year, take the 350 companies with the highest market value and from the 350 companies with the highest market value I want to calculate the median.

My data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int company float(monthly_date me)
 2 677   73280
 2 689  109610
 2 701  244600
 2 713  563190
 3 641   36570
 3 653   44750
 3 665  143170
 3 677   54250
 3 689   18690
 5 605   76750
 5 617   81250
 5 629   81000
 5 641  112480
 6 425   18190
 6 473   27920
 6 485   28800
 6 497   22210
 6 509   24310
 6 521   16050
 6 533   17640
 6 545   11290
 6 557    5120
 6 569    5460
 6 581    3220
 6 593    2930
 6 605    4780
 6 617    1110
 7 581  157430
 7 593  240560
 7 605  440400
 7 617  757340
 7 629  827040
 7 641  904180
 7 653  761700
 7 665 1041450
 7 677 1560010
 7 689 1990410
 7 701 2735150
 7 713 3030950
 8 425   77470
 8 437   92210
 8 449   80130
 8 461  174640
 8 473  191140
 8 485  825560
 8 497  892010
 8 509  340390
 8 521   95670
 8 533  191290
 8 545  287880
 8 557  953860
 8 569 1232120
 8 581  949150
 8 593  986490
 8 605 1426260
 8 617 2553610
 8 629 2979150
 8 641 4589000
 8 653 5965620
 8 665 5379720
 8 677 3682250
 8 689 3980100
 9 689  102530
 9 701   19490
10 641  393680
10 653  843550
10 665 1238050
10 677 1849900
10 689 1221640
10 701  514660
10 713  727080
11 581    3160
11 593    6980
11 605   11150
11 617    7070
11 629    7680
11 641    7640
11 653    7930
11 665   11710
11 677   15030
11 689   11860
11 701   23300
11 713   35440
12 545     470
12 557    1440
12 569    5680
12 581    3830
12 593   10080
12 605   18070
12 617   31590
12 629   52380
12 641  111210
12 653  104650
12 665  121710
12 677  253560
12 689  392530
12 701  680960
12 713  189260
13 569   95150
13 581   48910
end
format %tm monthly_date

I was thinking to do the command rowsort but I am not sure how to use it when I have a time variable that needs to be considered

After collecting the data I would simply run this command:

Code:

bys monthly_date: egen size = pctile(me), p(50)

Happy New Year to everyone and stay healthy

Create dummy variable to show change in certain timeframe of paneldata

Hi,

I am working with paneldata and want to create a dummy variable that shows me if there is a change in the variable -occ-(occupation) in one individual during the three months that my data contain. I have the variable occupation (occ) in form of codes for different occupations. The new variable I want to create should be 1 if the occupation code changes between sept/oct/nov. in one indiidual (i.e. if the individual changed occupations in the timeframe)..

I am happy for any tips,
thanks in advance!
Sophie

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double cpsidp int occ byte month
20190800062501 5710  9
20190800062501 5710 10
20190800062501 5710 11
20190800062502 2825  9
20190800062502 2825 10
20190800062502 2825 11
20190800093601 5240  9
20190800093601 5240 10
20190800093601 5240 11
20190800093602 5000  9
20190800093602 5000 10
20190800093602 5000 11
20190800129701  440  9
20190800129701  440 10
20190800129701  440 11
20190800129702 6355  9
20190800129702 6355 10
20190800129702  440 11
20190800134101 8990  9
20190800134101 8990 10
20190800134101 8990 11
20190800134102 5410  9
20190800134102 5410 10
20190800134102 5410 11
20190800139101 4220  9
20190800139101 4220 10
20190800139101 4220 11
20190800139102 4920  9
20190800139102 4920 10
20190800139102 4920 11
20190800208401 4710  9
20190800208401 4710 10
20190800208401 4710 11
20190800208402 2014  9
20190800208402 2014 10
20190800208402 2014 11
20190800278201  910  9
20190800278201  910 10
20190800278201  910 11
20190800278202 8740  9
20190800278202 8740 10
20190800278202 8740 11
20190800295701 2310  9
20190800295701 2310 10
20190800295701 2310 11
20190800295702 3870  9
20190800295702 3870 10
20190800295702 3870 11
20190800351201 5110  9
20190800351201 5110 10
20190800351201 5110 11
20190800351202 9130  9
20190800351202 9130 10
20190800351202 9130 11
20190800363601 7330  9
20190800363601 7330 10
20190800363601 7330 11
20190800363602 7700  9
20190800363602 7700 10
20190800363602  440 11
20190800364001 3160  9
20190800364001 3160 10
20190800364001 3160 11
20190800364002  440  9
20190800364002  440 10
20190800364002  440 11
20190800368201 3230  9
20190800368201 3230 10
20190800368201 3230 11
20190800368202 2100  9
20190800368202 2100 10
20190800368202 2100 11
20190800412801   20  9
20190800412801   20 10
20190800412801   20 11
20190800412802  800  9
20190800412802  800 10
20190800412802  800 11
20190800448901 3160  9
20190800448901 3160 10
20190800448901 3160 11
20190800448902 5000  9
20190800448902 5000 10
20190800448902 5000 11
20190800449801 5240  9
20190800449801 5240 10
20190800449801 5240 11
20190800449802 5530  9
20190800449802 5530 10
20190800449802 5530 11
20190800453101 1821  9
20190800453101 1821 10
20190800453101 1821 11
20190800453102 3710  9
20190800453102 3710 10
20190800453102 3710 11
20190800465101  440  9
20190800465101  440 10
20190800465101  440 11
20190800465102 6520  9
end
label values month MONTH
label def MONTH 9 "september", modify
label def MONTH 10 "october", modify
label def MONTH 11 "november", modify

Graph with bold ylabels

Hi all,

I am trying to get bold ylabels in a time series graph. I had success only in xlabel , example:

Code:

use "http://www.princeton.edu/~otorres/Stata/date.dta", clear

gen date4=_n
gen normal= rnormal(5000, 400)

replace date2=  "{bf:" + date2 + "}"

ssc install labutil
labmask date4, values(date2)

line  normal date4, xlab(1(500)4500, valuelabel angle(45))  sort
   Array

I tried to use the same steps to get bold ylabels but I failed

Code:

gen str_normal= string(normal)
replace str_normal=  "{bf:" + str_normal + "}"
labmask normal, values(str_normal)

may not label non-integers
r(198);

Thanks in advance

Regards

Dropping Observations of Variable according to frequency of occurence

Hi,

I am working with the following paneldata and want to drop all observations of which I have less than six observations per Household ID (Variable cpsid, which is the household ID of the panel). I filtered the data previously in a way that there are only heterosexual married couples in the sample that took the panel in sept, oct. and nov.. Now I want to keep only the households in which I have both parents- ie. for which I have the household ID six times, two times each month.

I hope this explanation helps understand what I am trying to do, if not, feel free to let me know what is unclear.
Thanks in advance!
Sophie

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double cpsid int year byte month
20190800045200 2020  9
20190800045200 2020 10
20190800045200 2020 11
20190800062500 2020  9
20190800062500 2020 10
20190800062500 2020 11
20190800062500 2020  9
20190800062500 2020 10
20190800062500 2020 11
20190800063700 2020  9
20190800063700 2020 10
20190800063700 2020 11
20190800090700 2020  9
20190800090700 2020 10
20190800090700 2020 11
20190800093600 2020  9
20190800093600 2020 10
20190800093600 2020 11
20190800093600 2020  9
20190800093600 2020 10
20190800093600 2020 11
20190800094600 2020  9
20190800094600 2020 10
20190800094600 2020 11
20190800106300 2020  9
20190800106300 2020 10
20190800106300 2020 11
20190800129700 2020  9
20190800129700 2020 10
20190800129700 2020 11
20190800129700 2020  9
20190800129700 2020 10
20190800129700 2020 11
20190800134100 2020  9
20190800134100 2020 10
20190800134100 2020 11
20190800134100 2020  9
20190800134100 2020 10
20190800134100 2020 11
20190800139100 2020  9
20190800139100 2020 10
20190800139100 2020 11
20190800139100 2020  9
20190800139100 2020 10
20190800139100 2020 11
20190800140400 2020  9
20190800140400 2020 10
20190800140400 2020 11
20190800208400 2020  9
20190800208400 2020 10
20190800208400 2020 11
20190800208400 2020  9
20190800208400 2020 10
20190800208400 2020 11
20190800250100 2020  9
20190800250100 2020 10
20190800250100 2020 11
20190800278200 2020  9
20190800278200 2020 10
20190800278200 2020 11
20190800278200 2020  9
20190800278200 2020 10
20190800278200 2020 11
20190800282000 2020  9
20190800282000 2020 10
20190800282000 2020 11
20190800285400 2020  9
20190800285400 2020 10
20190800285400 2020 11
20190800295700 2020  9
20190800295700 2020 10
20190800295700 2020 11
20190800295700 2020  9
20190800295700 2020 10
20190800295700 2020 11
20190800299400 2020  9
20190800299400 2020 10
20190800299400 2020 11
20190800321300 2020  9
20190800321300 2020 10
20190800321300 2020 11
20190800337700 2020  9
20190800337700 2020 10
20190800337700 2020 11
20190800351200 2020  9
20190800351200 2020 10
20190800351200 2020 11
20190800351200 2020  9
20190800351200 2020 10
20190800351200 2020 11
20190800363600 2020  9
20190800363600 2020 10
20190800363600 2020 11
20190800363600 2020  9
20190800363600 2020 10
20190800363600 2020 11
20190800364000 2020  9
20190800364000 2020 10
20190800364000 2020 11
20190800364000 2020  9
end
label values month MONTH
label def MONTH 9 "september", modify
label def MONTH 10 "october", modify
label def MONTH 11 "november", modify

Parallel trend test for DID model

Dear friends, why the test results using ttable and pstest are not consistent.

Code:

pstest $xlist Y  , t(treated)

The results is

Mean t-test V(T)/
Variable Treated Control %bias t p>t V(C)

x1 .42697 .38202 9.0 0.61 0.544 .
x2 30.427 31.517 -13.1 -0.87 0.387 0.84
y -.17585 -.39273 27.7 1.73 0.085 0.60*

* if variance ratio outside [0.66; 1.52]

Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var

0.030 7.20 0.616 11.1 10.8 40.3* 0.51 17

* if B>25%, R outside [0.5; 2]

If I use

Code:

ttest x1 Y, by(treated)  
ttest x2 Y, by(treated)

The result is changed,

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 1,368 32.70029 .192066 7.103845 32.32352 33.07707
1 64 29.29688 .8038473 6.430778 27.69051 30.90324

combined 1,432 32.54818 .1878338 7.107966 32.17973 32.91664

diff 3.403417 .9048948 1.628354 5.178481

diff = mean(0) - mean(1) t = 3.7611
Ho: diff = 0 degrees of freedom = 1430

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9999 Pr(T > t) = 0.0002 Pr(T > t) = 0.0001

Or simply,
[/QUOTE]
Two-sample t test with equal variances

Variables G1(0) Mean1 G2(1) Mean2 MeanDiff

x1 2631 34.269 89 30.427 3.843***
x2 2631 1248.832 89 989.303 259.529***

[/QUOTE]

The latter test shows that there was no parallel trend.

Thanks a lot.

Dropping duplicate observations conditioned on another variable

Hi,

I am using Stata 16.1, and have the following (general) issue. I want to drop duplicate observations of one variable (educ) from my dataset but conditioned on another variable (year).
I've found an online dataset, such that it may be easier to talk about. First up I apologize that I am new Stata.

My problem: I want to drop all duplicates of educ for each corresponding year (72 & 74). This means I do not want educ's value to enter for year 72 if it's value has already been observed.
Here is what I've tried do to:

use http://fmwww.bc.edu/ec-p/data/wooldridge/fertil1
keep if year == 72 | year == 74
drop if educ == educ[_n-1] & (year == 72 | year == 74)

The problem:
I am not dropping all duplicates of educ

Any suggestions would be appreciated

Numerical format in regression tables using esttab/estout

Dear Statalist,

I am using esttab (wrapper for estout) from SSC in Stata 16.1.

I am trying to export regression tables in which there are various models (columns) with different dependent variables (but the same independent variables). Is it possible to set different numerical display formats for the point estimate of the same coefficient in the various models?

Here is a simple example that illustrates my question: Assume that I try to assess if the car type (domestic/foreign) affects the price and the repair record of a car:

Code:

sysuse auto, clear
qui eststo: reg price foreign
qui eststo: reg rep78 foreign
 
esttab, b(%9.3f) se(%9.3f) nocons title("(1) basic output using esttab")          
eststo clear

I uploaded the output (1) as well as the desired output (2) as attachment.

I am aware that you can set different numerical display formats for different point estimates (b(fmt)). Is it also possible to change the display format for the same variable but another model? As in my example: three decimal places in model (1) and only one decimal place in model (2).

Or are there any other user-written commands (e.g., outreg) that may solve this problem?

Thanks for any advice!
Patrick

Array

Static Model with xtabond2

Dear Users,

I have the following equation for panel data

xtreg zer esg cap turn tdrdta prbv betal i.year, fe vce(cluster id)

However, I wonder about endogeneity problem and would like to use GMM equation for my model. Note that I do not have any instruments.

However as far as I see, xtabond2 command is being used for dynamic models rather than static ones.

Is there a way that I can convert my code above to a xtabond2 one without using the lag of the dependent variable?

Updating input output models using RAS method

Happy new year!
Is there any tool that enables to update IO tables in STATA?

Find unobserved values

Hey guys,

I am currently trying to figure out how to find unobserved values. I have different sic codes (labeled "sic), accordingly values for sale (labeled "sale") and I want to look at a time period of 26 years, namely from 1984 to 2010.
I want to know, whether each of the sic codes has data of sale within the 26 years. I know, in fact, that this is not the case for each sic code.
Do you know how I can find out which of the sic codes does not have sale data of all year (e.g. lags year 1989 and 2008 and thus has only 24 values)?

THANK YOU SO MUCH IN ADVANCE!!!

choropleth map stretched out

Hi everyone, I'm using Stata 16 to make a chloropleth map of Connecticut.
Reading the posts on here, I tried using geo2xy to correct the problem, but I don't know what I am doing wrong because the map is still stretched out.

The shapefile was gotten from here: http://magic.lib.uconn.edu/connecticut_data.html

I've tried to follow this post: https://www.statalist.org/forums/for...-geo2xy-on-ssc

Which worked, but I don't know how to apply it to my example. When I do try, I keep getting the stretched out map as a result.

NEVERMIND, I WAS ABLE TO FIGURE THIS OUT. Turns out, I need to save the ct_lat_long file after using geo2xy to alter the coordinates.

Code:

cd "C:spatial files\townct_37800_0000_2010_s100_census_1_shp\townct_37800_0000_2010_s100_census_1_shp\wgs84"

spshape2dta townct_37800_0000_2010_s100_census_1_shp_wgs84.shp, saving(cttowns) replace

 use "C:spatial files\townct_37800_0000_2010_s100_census_1_shp\townct_37800_0000_2010_s100_census_1_shp\wgs84\cttowns.dta"


sort NAME10
ren NAME10 townsct169str
replace townsct169str = proper(trim(townsct169str))

save "cttowns.dta", replace


geo2xy _CY _CX, gen(latitude_y longitude_x) project(mercator)
save "ct_towns_lat_long.dta", replace

save "cttowns.dta", replace

keep townsct169str _ID _CX _CY
save ct_towns_lat_long.dta
use "ct_towns_lat_long.dta",clear
sort townsct169str
save "ct_towns_lat_long.dta", replace

import excel "C:CDP to towns, victimnumber.xls", sheet("Sheet1") firstrow clear

sort townsct169str

merge townsct169str using ct_towns_lat_long.dta

drop if _ID==.

grmap VictimNumber

spset, modify coordsys(latlong)

***
use "cttowns_shp.dta", clear
geo2xy _Y _X, replace
save "cttowns_shp_xy.dta", replace

use "cttowns.dta", clear
spset, modify shpfile(cttowns_shp_xy)

grmap
***

import excel "C:CDP to towns, victimnumber.xls", sheet("Sheet1") firstrow clear

sort townsct169str

use "ct_towns_lat_long.dta",clear
sort townsct169str
save "ct_towns_lat_long.dta", replace

merge townsct169str using ct_towns_lat_long.dta

drop if _ID==.

grmap VictimNumber

use "Victim201719file.dta", clear
grmap VictimNumber

How to solve auto-correlation using unbalanced panel with gaps

Hi all,
I am dealing with unbalanced panel data with gaps, it is a survey to manufacturer companies between 1990 to 2012. I am using xtreg, vce (cluster varlist) fe in stata 16.1.

Code:

xtreg growth innov exp rec lnage lnpertot size profg i.naceclio i.year, vce (cluster id) fe

Where growth is the log (sales n/ sales n-1) and innov exp and rec are independent variables regarding innovation and market dynamism, all IV are dummies. The other variables are control variables.

My doubt is that I have found serial correlation using xtserial:
Array

And when I run the Portmanteau test -xtistest-, it confirms autocorrelation and is not balanced for gaps?
Array
I have two questions:

Do you have any suggestion to solve these serial correlation problems?
Can you provide me information regarding the meaning of this "gaps" problem?

Thank you in advance for your help.

PD. This is my first time in the Stata forum, if I am not posting correctly, I will appreciate your comments to improve for the next times.

Best Regards,

Alexandra.

Wednesday, December 30, 2020

Finding minimum value out of 5 different variables for each row

I would like to create a new variable that equals the minimum value out of 5 different variables for each row.

Example: For each "ID" would like to find the minimum value out of v1,v2,v3,v4,v5 and create a new variable "v6" using that value

ID v1 v2 v3 v4 v5
1 22 25 28 30 11
2 15 28 14 19 29

Update: Realized this works, sorry for the wasted post!

egen v6=rowmin( v1 v2 v3 V4 V5)

How would I create this graph?

I am very rusty with STATA and need some help with creating this graph shown in the image. What would be the command in order to create a bell curve graph like this which represents both the t-statistic and p-value? Sorry if this is extremely simple, I have looked for this everywhere and cannot seem to find it. Thank you.

Array

Deleted

Predicting probability of a multivalued endogenous treatment

I am trying to figure out how to predict probability of participating in different treatments (or multivalued treatment). Since the entreat ( ) equation does not allow mlogit option, I am not sure how to do it. In the following example below, I have created 'program1' a multivalued treatment. But the prediction is not for two different types of programs ( program1==1, program1==2). Any suggestions to graph the probabilities of participation for the two different treatments?
Thank you so much for your kind help.
Sincerely,
Manish

Code:

use https://www.stata-press.com/data/r16/class10,clear
clonevar program1= program 
replace program1=2 if program==1 & _n <1000
eprobit graduate income i.roommate, endogenous(hsgpa = income i.hscomp) entreat(program1 = i.campus i.scholar income, pocorrelation) vce(robust)
preserve 
replace campus=1
replace scholar=1
predict p1 , pr equation(program1)
twoway line p1 income, sort

Dealing with id problems in panel data

Hello, I am using a person-firm-level panel data for analysis of wage differentials. But I found that in my dataset has some duplicated identifiers for different firms, e.g: the same identity numbers for two firms in two different regions in the same year. I want to:
1) Identify this firms ids
2) Drop these firms ids

input double cnpjcei long município float ano
459400009379 110001 2010
500088456682 110002 2018
500072511785 110002 2015
500072511785 110002 2018
512209712772 110002 2013
260040022389 110002 2010
500072511785 110002 2017
260040022389 110002 2010
260040022389 110002 2013
260040022389 110002 2013

spmap

Dear All,
I am using stata14 and trying to create maps of Bangladesh using the package spmap. I want two maps on the same scale. But when I am doing it stata is taking random scales for two maps. How to fix it?

Quantile Regression

Code:

reg price  treatment i.state i.year income rural [fw=round(weight)] , cluster(state)

Hi

I ran the above regression but now I wish to look at the impact on the 90th percentile of the dependent variable instead of the mean.
Is the code below fine? Also how do I cluster the standard errors at the state level in the following quantile regression.

Code:

qreg price  treatment i.state i.year income rural [fw=round(weight)] , quantile(0.9)

regression coefplot with 3 variables

hello
I want to make a regression coefplot for 3 variables, the independent variable being a standardised AIDS figure (number of known people who have died from AIDS) and then I want to see the relationship between this and gender (labelled as 'gender') and also how often an individual has gone without medical care which comes under the name 'q8c' and has categorical answers such as 'many times', 'don't know' etc. I tried to make a regression table using the command 'reg z_aids gender ibn.q8c' but Stata said that the factor variables may not contain negative values. I don't know what this means and I'm pretty certain even attempting to put these variables into a regression coefplot won't work so does anyone have any advice on what to do next or what to display the data as instead? Thanks!

【imputation】how to fix r(459) in mi impute chained

Hello!
I recently encountered a problem in using MI module in Stata. I want to impute variable wage by "mi impute chained" command. But it return error message "r(459)".
Here are my sample dataset and code.

Dataset: https://raw.githubusercontent.com/Gi...ter/sample.dta.
(I'm so sorry that the mximum samples export from my original data by dataex does not support multiple imputation analysis.)

Code:

use sample.dta,clear
mi set mlong
mi register regular gender h_group2 h_group3 h_group4 h_group5 num_child num_elderly pcgdp child_elder percent_farmer population service_sector age26_30 age31_35 age36_40 age41_45 age46_50 age51_55 Liaoning Heilongjiang Shandong Henan Hubei Hunan Guangxi Guizhou year1997 year2000 year2004 year2006 year2009 year2011 year2015 times
mi impute chained (intreg, ll(log_wage_1_ll) ul(log_wage_1_ul) conditional(if employed_last1 == 1)) log_wage = gender h_group2 h_group3 h_group4 h_group5 num_child num_elderly pcgdp child_elder percent_farmer population service_sector age26_30 age31_35 age36_40 age41_45 age46_50 age51_55 Liaoning Heilongjiang Shandong Henan Hubei Hunan Guangxi Guizhou year1997 year2000 year2004 year2006 year2009 year2011 year2015 times, add(1) rseed(12121) chaindots

Code:

log_wage_1 is not consistent in the observed data with the variables log_wage_1_ll and log_wage_1_ul containing the lower and upper interval-censoring
  limits
 -- above applies to specification (intreg , ll(log_wage_1_ll) ul(log_wage_1_ul) conditional(if employed_last1 == 1)) log_wage_1 = gender h_group2 h_group3
    h_group4 h_group5 num_child num_elderly pcgdp child_elder percent_farmer population service_sector age26_30 age31_35 age36_40 age41_45 age46_50
    age51_55 Liaoning Heilongjiang Shandong Henan Hubei Hunan Guangxi Guizhou year1997 year2000 year2004 year2006 year2009 year2011 year2015 times

r(459);

If I rewrite the code as follows, it can work.

Code:

mi impute intreg log_wage gender h_group2 h_group3 h_group4 h_group5 num_child num_elderly pcgdp child_elder percent_farmer population service_sector age26_30 age31_35 age36_40 age41_45 age46_50 age51_55 Liaoning Heilongjiang Shandong Henan Hubei Hunan Guangxi Guizhou year1997 year2000 year2004 year2006 year2009 year2011 year2015 times,add(1)rseed(12121) ll(log_wage_1_ll) ul(log_wage_1_ul) cond(if employed_last1 == 1)

I must use "mi impute chained" command because I have some other variables need to be imputed. Can anyone help me to fix this error? Thank you!

apc model

Hi,
I want to estimate the following model: log_income = B*age with year and cohort fixed effects. I'm trying to use the command apc but the output gives an estimation of beta for each value of the variable age. I want only one beta that estimate the effect of age. The code I'm using is:
apc log_income, age(age) period(year) cohort(yrbrn)
where year is the year of the survey and yrbrn is the year of birth. How can I solve my problem?

Estimating Intergenerational Correlation (IGC) with nlcom

Dear Statalisters,

I am trying to estimate the Intergenerational Correlation (IGC) in education using nlcom but cannot figure out how to

\[ IGC : \rho_{IG}= \beta1 * \frac{\sigma_{parent}}{\sigma_{child}} \]

Where P_IG = Intergenerational Correlation (IGC),

Beta1 = intergenerational regression coefficient (IGRC) coming from estimating

\[ Edu_{i,g} = \beta_{0}+ \beta_{1}Edu_{i,parent}+\beta_{2}Y_{i,grandparent} + \epsilon_{ig} \]

and Sigma= standard deviation of the parent/child

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(gen3_edu gen2h_edu gen1_edu)
 9  0  0
15 12  0
12 10 10
 0  8  0
 0  0  0
 8  0  0
 9  9  0
 9  8  0
10  0  0
15 10 10
 9  8  0
 0  0  0
15  0  0
11  8  0
11  0  0
16 10 10
 9  9  9
15  9  9
16  5  5
 8  0  0
 0  0  0
12 12  0
11  0  0
14  0  0
 5  5  5
 8  0  0
12  9  0
12 10  .
 8  0  0
 9  0  0
15  9  0
15  9  9
10 10 10
 9  9  9
10 10 10
 0  0  0
 9  3  0
15  0  0
 9  0  0
13  8  8
 9  0  0
11 12  0
14 10  5
16 10 10
12 10 12
 5  5  5
 7  0  0
15  0  0
 9  5  0
11  0  0
 0  0  5
16  0  0
12  0  8
 9  0  0
 9  9  9
 0  6  6
16  9 12
 9  0  0
15  0  0
 8  7  0
 0  8  0
10  8  0
 8 10 10
 4  0  0
15  0  0
12 10 11
 5  7  7
 2  0  0
 0  8  0
 9  0  0
 9  8  0
 7  0  0
11  0  0
16  7  0
10 10  0
 9  0  0
14  0  0
11  9  0
16  0  0
15  8  0
 8  9  0
12  9  0
 9 12  0
 0  0  0
16 12 12
16  9 10
 9 10  0
12 10  0
10  8  0
 7  0  0
 9  8  0
11  8  0
12  9  0
12  0  0
16  9  0
 9  7  0
 0  0  0
 0  0  0
 3  0  0
14  0  .
end

spmap options

Hello,

I used -spmap- to map US employment and this is the code I used:

Code:

spmap cbp_emp_sum using uscoord if NAME!="Alaska" & NAME!="Hawaii", id(id) fcolor(Blues) legend(position(8))

However, the map is 'long' and I would like to have it nicely displayed. Is there an option that I should add?

Also, is there a way to display Alaska and Hawaii in the bottom left ?

Thank you in advance.
Array

New version of xcontract )AGAIN) on SSC

Thanks once again to Kit Baum, a new version of the xcontract package (superseding yesterday's version) is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of xcontract.

The xcontract package is described as below on my website. The new version no longer uses file processing (except if the user specifies a saving() option). Previously, file processing was used if the zero option was specified. It has now been replaced by frame processing.

I would like to thank Jeremy Freese of Stanford University for writing the frameappend command, from which I have adapted some of the code for xcontract.

Best wishes

Roger

--------------------------------------------------------------------------------------
package xcontract from http://www.rogernewsonresources.org.uk/stata16
--------------------------------------------------------------------------------------

TITLE
xcontract: Create dataset of variable combinations with frequencies and percents

DESCRIPTION/AUTHOR(S)
xcontract is an extended version of contract. It creates an output
data set with 1 observation per combination of values of the
variables in varlist and data on the frequencies and percents of
those combinations of values in the existing data set, and,
optionally, the cumulative frequencies and percents of those
combinations. If the by() option is used, then the output data set
has one observation per combination of values of the varlist
variables per by-group, and percents are calculated within each
by-group. The output data set created by xcontract may be listed to
the Stata log, or saved to a data frame, or saved to a disk file, or
written to the memory (overwriting any pre-existing data set).

Author: Roger Newson
Distribution-Date: 30december2020
Stata-Version: 16

INSTALLATION FILES (click here to install)
xcontract.ado
xcontract.sthlp
--------------------------------------------------------------------------------------
(click here to return to the previous screen)

Overlapping values on y axis in Stata graph

Array

Hi, is there a way to avoid overlapping of numbers on the y axis

Outreg2 to Excel R(198) error diagnosis

I am trying to run the below code in STATA 16.1. STATA is fully updated, and I've uninstalled and re-installed the outreg2 command, re-started STATA, and restarted my laptop to no avail.
I was previously able to successfully run this code on the same laptop a few days ago. I've made no change to the below code, or the earlier part of the do-file that tells STATA where to save outputs. Note there are "" around the output file path and the file name in the problem code, which is because my output file path as a space that I cannot remove (Emily XPS13).

Output file path: global outputs "C:\Users\Emily XPS13\Desktop\HESAanalysis\AlmudenaHESAOutputs"

Problem code: reg lsalary16 BLACKFEMALE BLACKMALE ASIANFEMALE ASIANMALE OTHERFEMALE OTHERMALE WHITEFEMALE, robust
outreg2 using "$outputs/TableEthnicGender.xls", replace dec(3) keep(BLACKFEMALE BLACKMALE ASIANFEMALE ASIANMALE OTHERFEMALE OTHERMALE WHITEFEMALE) label addtext(Controls, NO, University Dummies, NO, Year Dummies, NO)

When I try to run the below problem code today, every time I'm getting an error reading:
invalid 'XPS13'
r(198);

The regression itself will run fine in STATA if I just ru the reg line of code. If I run the whole thing, I get the above error. However, I can see by looking at the file where my outputs should go, that STATA is generating a txt file there with the results information, but it is failing to translate that into an Excel file to complete the action; giving me the r(198) error instead.

I also tried to make the file a .xslx file instead just in case, but same error problem. Any further trouble shooting suggestions? Feel a bit crazy since this worked a few days ago!

Median, Tercile and Quartile

Hi, I want to create dummy variables based on the median. Sometiems I do that based on quartile and tercile. I do the following

1. This will create a yearly median variable and median dummy

egen ana= xtile(num), n(2) by(fyear)
gen num_dummy = cond(missing(ana), ., (ana>1))

2. This will create a yearly tercile variable and a dummy based on top tercile
egen ana= xtile(num), n(3) by(fyear)
gen num_dummy = cond(missing(ana), ., (ana>2))

3. This will create a yearly quartile variable and a dummy based on top quartile
egen ana= xtile(num), n(4) by(fyear)
gen num_dummy = cond(missing(ana), ., (ana>3))

Though I checked that the codes give me the correct answer even if I calculate them by using slightly different code, I want to make sure from the expert that I am doing it correctly.
So, experts, could you please tell me whether I am doing things correctly or not?

Use the lastvar local to read v1 v2 v3 variables from insheet with numeric names

I am reading in GIS data provided by a soil scientist on my research team that is exported to csv files with numeric names (years only). I have created code to read in a list of csv files from different regions and rename the variables from the v1, v2, v3..... vn format to the format varname_year (here SPI_year) using the variable labels in the name. The problem is that I'd like to set this up to run a loop from the first variable v2 to the last where the last variable name changes depending on the data I'm importing. The loc lastvar: word `c(k)' of `r(varlist)' does not read the last variable since it is not a valid name. Is there a way for Stata to recognize the last variable "vn" in this list? My code is as follows:

***** This does not work. Error: nothing found where name expected
cd "$mypath"
foreach region in region1 region2 {
insheet using "$mypath\SPI_`region'.csv", clear
qui des
loc lastvar: word `c(k)' of `r(varlist)'
foreach v of varlist v2-`lastvar' {
local x : variable label `v'
rename `v' SPI_`x'
}

***** This works but I added the name of the last variable
cd "$mypath"
foreach region in region1 region2 {
insheet using "$mypath\SPI_`region'.csv", clear

foreach v of varlist v2-v469 {
local x : variable label `v'
rename `v' SPI_`x'
}

My data look like this, where lotid is the unique identifier:
lotid v2 v3 v4 v5 v6 v7 v8
13008 1.7 2.34 0.75 0.80 1.23 3.78 0.85

Generate weighted median variable by other variables

Hello,

I have a dataset which lists the year, state, age group, income, and survey weight of individuals surveyed. Agegroup takes on 0 if the individual is 16-19 years old, 1 if 20-24, 2 if 25-29, etc.

I would like to create a variable that has the weighted median income of individuals by year, state, and agegroup. So for example, I would like the weighted median income of all individuals in 2015, in Alabama, in agegroup 0 (ie 16-19 yo).

I've read some simpler examples of weighted medians, such as this one using levelsof and foreach, but I'm unsure of how to do so by year, state, agegroup specifications.

Thank you in advance!

Postestimation test for cross-sectional time series FGLS regression

Hi

I'm conducting a study on the determinants of bank profitability in my country. I have data from 16 of the total 18 banks (N=16) spanning over a period of 40 (T=40). Theory pointed me to 'xtgls" as opposed to 'xtreg'.

After running a pre-FGLS regression I found that autocorrelation, heteroskedasticity and cross-sectional dependence were present when I tested for them by using 'xtserial', 'xttest3' and 'xttest2' respectively.

I then ran two another FGLS regressions with all three problems specified and got the results below. Model 2 is simply a modification of model 1 as it has square terms of all the continuous variables.

Model 1:

Cross-sectional time-series FGLS regression

Coefficients: generalized least squares
Panels: heteroskedastic with cross-sectional correlation
Correlation: common AR(1) coefficient for all panels (0.8231)

Estimated covariances = 136 Number of obs = 640
Estimated autocorrelations = 1 Number of groups = 16
Estimated coefficients = 6 Time periods = 40
Wald chi2(5) = 630.93
Prob > chi2 = 0.0000

------------------------------------------------------------------------------
ROA | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
LNTA | 3.017828 .1259643 23.96 0.000 2.770943 3.264714
CAPR | -.0089637 .0089082 -1.01 0.314 -.0264235 .0084962
LIQR | .0030467 .0016026 1.90 0.057 -.0000944 .0061878
INFL | -.0097423 .0312732 -0.31 0.755 -.0710367 .051552
GDPG | -.0421484 .126283 -0.33 0.739 -.2896584 .2053617
_cons | -41.75591 2.267979 -18.41 0.000 -46.20107 -37.31075
------------------------------------------------------------------------------

Model 2:

Cross-sectional time-series FGLS regression

Coefficients: generalized least squares
Panels: heteroskedastic with cross-sectional correlation
Correlation: common AR(1) coefficient for all panels (0.8098)

Estimated covariances = 136 Number of obs = 640
Estimated autocorrelations = 1 Number of groups = 16
Estimated coefficients = 11 Time periods = 40
Wald chi2(10) = 645.23
Prob > chi2 = 0.0000

------------------------------------------------------------------------------
ROA | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
LNTA | 28.27193 1.988883 14.21 0.000 24.37379 32.17007
CAPR | .0968506 .0172677 5.61 0.000 .0630064 .1306947
LIQR | .020302 .0031623 6.42 0.000 .014104 .0265001
INFL | -.2763099 .1310412 -2.11 0.035 -.5331459 -.0194738
GDPG | 1.110413 .3644671 3.05 0.002 .3960706 1.824756
LNTA2 | -.9128306 .067772 -13.47 0.000 -1.045661 -.78
CAPR2 | -.0013482 .0003059 -4.41 0.000 -.0019478 -.0007487
LIQR2 | -.0000548 9.47e-06 -5.78 0.000 -.0000734 -.0000362
INFL2 | .0081777 .0044353 1.84 0.065 -.0005154 .0168707
GDPG2 | -.1538821 .0366569 -4.20 0.000 -.2257284 -.0820358
_cons | -216.4275 14.58703 -14.84 0.000 -245.0175 -187.8374
------------------------------------------------------------------------------

Question: is there a way to test between which models fits the data better seeing that AIC and the LIkelihood ratio test can't be used.

PS: I'm using an older version of STATA (14.2)

Any help is highly appreciated.

Different y-axis range on xtline plots

I'm trying to construct some xtline plots using 10 different ids. The range of values of y for id #s 1-9 is 0-100, but for id #10 the range of values for y is 0-2000. This causes xtline to set the y axis range from 0 to 2000, making the plots for id #s 1-9 effectively meaningless because the y axis' range is so large compared to the actual range of values.

Is there a way to alter the y axis on individual xtline plots? The only solutions I've found so far is to simply exclude the problematic id using an if condition or to manually create individual graphs for each id using twoway line, but I'm sure there's a better way to deal with this issue.

If it helps, here's a sample dataset and two graphs to illustrate the problem:

Code:

clear
set obs 10
gen id = _n
expand 20
bys id: egen time = seq(),f(1) t(20)
xtset id time
gen var1 = runiformint(0,100) if id !=10
replace var1 = runiformint(0,2000) if id == 10
xtline var1
xtline var1 if id !=10

Any help is much appreciated.
Array Array

heteroskedasticity in logistic regression model

Hi,

I have cross sectional data and am using logistic regression. My question is how do I check my data for heteroskedasticity and in case it is present, then how to deal with it.

I have come across a lot of information using linear regression along with the Breusch-Pagan Test (using command "hettest") or White’s Test (using command "imtest") for testing for heteroskedasticity. And heteroskedasticity is dealt with by computation of "Robust Standard Errors". However, there is less information on this issue in case of logistic regression.

I'll be grateful, if anyone could please help me.

Regards,
Juhee

change the color of bars -BOXPLOT

Hi to everybody,
I have done for the first time "Box plot by group with data point"

By default, Stata gives me graphs with the same colors. Does anyone know how to change the color of the boxes? I would need 9 different colors (one for Pups_a, one colour for Pups_b ecc)
below the syntax I wrote

twoway rbar lqt med Contesto_num, barw(.5) fcolor(gs12) lcolor(black) || ///
rbar med uqt Contesto_num, barw(.5) fcolor(gs12) lcolor(black) || ///
scatter Complex Contesto_num, graphregion(fcolor(gs15)) mcolor(black) msymbol(Oh) ///
legend(off) xlabel( 1 ".Pups_a" 2 ".Pups_b" 3 ".Pups_c" 4 ".aleF_a" 5 ".aleF_b" 6 ".aleF_c" 7 "FF_a" 8 "FF_b" 9 "FF_a" ) ///
ytitle(Complex score)

Thanks to everybody

Equivalent for noisily option in fuzzy package?

Hello Statalist,

does anyone know how to get Stata to print all calculations occuring due to commands from the 'fuzzy' package by Longest & Vasey (2008)? As I've mentioned in a separate post, I'm loosing cases through the tabulate bestfit command in the fuzzy package, and don't understand why.

To figure out whats happening, I'd like to get Stata to print all underlying calculations. I've done this before using the 'noisily' option (for other commands in other packages), but thats not affecting the output here at all. See below:

Code:

. fuzzy A S O F T I

. tabulate bestfit, sort

    bestfit |      Freq.     Percent        Cum.
------------+-----------------------------------
      sOFTI |          4       16.67       16.67
      SOFTI |          2        8.33       25.00
      SoFTI |          2        8.33       33.33
      sOfti |          2        8.33       41.67
      soFTi |          2        8.33       50.00
      sofTI |          2        8.33       58.33
      sofTi |          2        8.33       66.67
      softI |          2        8.33       75.00
      SOFtI |          1        4.17       79.17
      SOfTI |          1        4.17       83.33
      SofTi |          1        4.17       87.50
      sOFti |          1        4.17       91.67
      sOfTi |          1        4.17       95.83
      soFTI |          1        4.17      100.00
------------+-----------------------------------
      Total |         24      100.00

. noisily: fuzzy A S O F T I

. noisily: tabulate bestfit, sort

    bestfit |      Freq.     Percent        Cum.
------------+-----------------------------------
      sOFTI |          4       16.67       16.67
      SOFTI |          2        8.33       25.00
      SoFTI |          2        8.33       33.33
      sOfti |          2        8.33       41.67
      soFTi |          2        8.33       50.00
      sofTI |          2        8.33       58.33
      sofTi |          2        8.33       66.67
      softI |          2        8.33       75.00
      SOFtI |          1        4.17       79.17
      SOfTI |          1        4.17       83.33
      SofTi |          1        4.17       87.50
      sOFti |          1        4.17       91.67
      sOfTi |          1        4.17       95.83
      soFTI |          1        4.17      100.00
------------+-----------------------------------
      Total |         24      100.00

Does anybody have any advice on how I can get Stata to tell me what's going on behind the scenes here?

Mixed effects model on eyes observations

Hi statalist team! I am glad that I participate here! I would like to ask for some help on a "begginer's" problem that I am facing. I made a cross sectional study on a score and variables by checking both eye for each individual. My dataset is:
+---------------------------------------------------------------+
| id eye total center peri sex age surgery D |
|---------------------------------------------------------------|
1. | 1 0 74 17.43 56.5 0 21 1 2 |
2. | 1 1 71 14.04 57.38 0 21 1 2 |
3. | 2 0 82 14.04 68.23 1 22 1 1 |
4. | 2 1 88 15.93 71.86 1 22 1 1 |
5. | 3 0 78 14.87 64.56 0 23 1 2.5 |
|---------------------------------------------------------------|
6. | 3 1 76 9.98 66.33 0 23 1 2.5 |
7. | 4 0 82 13.37 68.23 0 46 1 6 |
8. | 4 1 79 15.93 63.09 0 46 1 6 |
9. | 5 0 73 9.98 63.1 1 22 1 3.5 |
10. | 5 1 68 11.12 56.47 1 22 1 3.5 |
|---------------------------------------------------------------|
11. | 6 0 67 9.98 57.03 0 21 1 2.5 |
12. | 6 1 65 15.93 48.62 0 21 1 2.5 |
13. | 7 0 60 9.98 49.82 0 21 1 2 |
14. | 7 1 68 11.12 56.79 0 21 1 1 |
15. | 8 0 65 13.37 51.83 0 39 1 2.5 |

What is the most proper way to start my model? What is the proper command for mixed linear regress model?
Thank you!!

Converting annual data to semiannual

Hi everyone, i have to convert a set of annual observations about debt and deficit gdp ratios into semiannual, can someone hepl me? It's important. I leave here an extract of my dataset (data only for belgium)

location year debt_gdp pd_gdp
BEL 1996 129 -4.002527
BEL 1997 124.3 -2.151257
BEL 1998 119.2 -1.024874
BEL 1999 115.4 -.6483495
BEL 2000 109.6 -.0792975
BEL 2001 108.2 .2328486
BEL 2002 105.4 -.0436953
BEL 2003 101.7 -1.864152
BEL 2004 97.2 -.2386971
BEL 2005 95.1 -2.714993
BEL 2006 91.5 .2395806
BEL 2007 87.3 .0664399
BEL 2008 93.2 -1.095345
BEL 2009 100.2 -5.430989
BEL 2010 100.3 -4.087706
BEL 2011 103.5 -4.330078
BEL 2012 104.8 -4.318382
BEL 2013 105.5 -3.129378
BEL 2014 107 -3.055682
BEL 2015 105.2 -2.413671
BEL 2016 105 -2.363113
BEL 2017 102 -.6864632
BEL 2018 99.8 -.7937546

Thank you in advance

interaction effect in instrument variable probit model

Dear All, my query is related to the interaction effect in the instrument variable probit model. When I running the ivprobit model with my main variables and their interaction (in addition to other control variables) the coefficient of the main variables are coming out to be significant but the coefficient of the interaction effect is insignificant. Whereas, in the case when I am running ivprobit with only the interaction effect variable (not using the main effect variable) the interaction effect coefficient is coming out to be significant. Kindly, guide me on what should I do in this regard, i.e. should I report the regression result with main variables and interaction effect or result with interaction effect only.
Looking forward to suggestions

Thanks

How to group connected dates?

I have a panel containing daily data which are not all continuous. Within an ID, if those dates are continuous or the gap between two dates is less than 30 days, they can be defined as one group. I would like to create a variable to distinguish these groups, for example, a variable equals to 1 for the first group of continuous date, 2 for the second group of continuous dates, etc.

In short, the goal is to create "indi3"
I was thinking create "indi1" to identify the occurrence of first not continuous date, "indi2" defined as following.

gen indi2=0
bysort id: replace indi2=indi1 if _n==1
bysort id: replace indi2=indi2[_n]+indi1[_n-1] if indi2!=1

The goal can be achieved if the last row of command can be looped, but I am not sure how to do it.
Otherwise, I would like to know is there any other method to achieve this goal?

Thank you

id	date	indi1	indi2	indi3 (this is the goal)
1	24mar2018	1	1	1
1	25mar2018		1	1
1	26mar2018			1
1	27mar2018			1
1	28mar2018			1
1	29mar2018			1
1	20sep2019	1		2
1	21sep2019		1	2
1	22sep2019			2
1	23sep2019			2
1	24sep2019			2
1	25sep2019			2
1	26sep2019			2
1	27sep2019			2
2	04dec2018	1	1	1
2	05dec2018		1	1
2	06dec2018			1
2	07dec2018			1
2	09mar2019	1		2
2	10mar2019		1	2
2	11mar2019			2
2	12mar2019			2
2	29jun2019	1		3
2	30jun2019		1	3
2	01jul2019			3
2	02jul2019			3
2	09oct2019	1		4
2	10oct2019		1	4
2	11oct2019			4
2	12oct2019			4
2	13oct2019			4
2	14oct2019			4
2	15oct2019			4

datetime of march 31, 2012

Dear All, I have this data set

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(patient_id appo_date) str4 discipline str1 attended_ind
 335 18652 "NURA" "Y"
 335 19041 "DMRS" "Y"
 335 19376 "NURA" "N"
 347 18401 "NURA" "Y"
 347 19128 "NURA" "Y"
 347 19156 "DMRS" "Y"
 393 18976 "DMRS" "Y"
 393 18998 "NURA" "Y"
 395 18556 "NURA" "Y"
 395 19262 "NURA" "Y"
 395 19277 "DMRS" "Y"
 395 19438 "SCCC" "Y"
 802 18674 "DMRS" "N"
 802 19192 "DMRS" "Y"
 802 19220 "NURA" "Y"
 872 19397 "DMRS" "Y"
 907 18218 "NURA" "Y"
 907 18259 "NURI" "Y"
 907 18259 "ACDI" "Y"
 907 18287 "NURI" "Y"
 907 18287 "ACDI" "Y"
 907 18294 "ACDI" "Y"
 907 18294 "NURI" "Y"
 907 18315 "NURI" "Y"
 907 18315 "ACDI" "Y"
 907 18364 "ACDI" "Y"
 907 18364 "NURI" "Y"
 907 18420 "ACDI" "Y"
 907 18420 "NURI" "Y"
 907 18448 "NURI" "Y"
 907 18448 "ACDI" "Y"
 907 18504 "ACDI" "Y"
 907 18504 "NURI" "Y"
 907 18560 "ACDI" "Y"
 907 18560 "NURI" "Y"
 907 18604 "NURA" "Y"
 907 18644 "NURI" "Y"
 907 18644 "ACDI" "Y"
 907 19304 "NURA" "Y"
 907 19314 "DMRS" "Y"
 934 19198 "DMRS" "N"
 934 19207 "DMRS" "Y"
 934 19241 "NURA" "N"
1000 18376 "NURA" "N"
1000 18411 "NURA" "Y"
1000 18998 "NURI" "Y"
1000 19082 "NURI" "Y"
1000 19129 "NURA" "Y"
1000 19132 "DMRS" "Y"
1031 18975 "NURA" "Y"
1031 18975 "DMRS" "Y"
1031 18977 "NURI" "Y"
1071 18716 "NURA" "Y"
1071 19418 "NURA" "Y"
1071 19451 "DMRS" "Y"
1240 19068 "DMRS" "Y"
1240 19076 "NURA" "Y"
1240 19214 "PHYT" "Y"
1240 19431 "DMRS" "Y"
1240 19431 "PHYT" "Y"
1240 19460 "NURA" "Y"
1240 19463 "PODI" "Y"
1270 18873 "NURI" "Y"
1270 18900 "NURA" "N"
1270 18908 "DMRS" "N"
1270 18912 "NURI" "N"
1270 19339 "DMRS" "Y"
1316 18967 "DMRS" "Y"
1316 18976 "DMRS" "Y"
1316 19009 "NURA" "N"
1316 19044 "NURA" "Y"
1316 19044 "NURI" "Y"
1316 19141 "DMRS" "Y"
1316 19386 "DMRS" "Y"
1511 18305 "NURA" "Y"
1511 18410 "DIET" "Y"
1511 18438 "DIET" "N"
1511 18982 "NURI" "Y"
1511 19033 "NURA" "N"
1511 19065 "DMRS" "Y"
1511 19096 "NURA" "Y"
1511 19437 "DMRS" "Y"
1799 18189 "NURA" "N"
1799 18305 "NURA" "N"
2027 19129 "NURA" "Y"
2027 19129 "DMRS" "Y"
2320 19055 "DMRS" "Y"
2556 18212 "NURA" "Y"
2556 18592 "NURA" "Y"
2556 19292 "NURA" "Y"
2556 19307 "DMRS" "Y"
2561 18728 "DMRS" "Y"
2561 18758 "NURA" "Y"
2561 18952 "DMRS" "Y"
2561 18956 "DIET" "Y"
2561 19061 "DIET" "Y"
2561 19187 "DIET" "Y"
2561 19320 "DIET" "Y"
2561 19481 "DIET" "N"
2606 18246 "NURA" "Y"
end
format %tdD_m_Y appo_date

I have done the following,

Code:

label define a 1 "Y" 0 "N"
encode attended_ind, gen(a)
collapse (sum) a if appo_date <= 19082 & discipline == "NURA", by(patient_id)
drop if a == 0

I recall that there is a formal way to use something like `=mar312012' (???) instead of the number 19082. Any suggestions? And, where can I find more information about the corresponding command? Thanks.

Tuesday, December 29, 2020

Modifying dataset for relative time model

Array
Array
Array
Array

P.S. To express notation correctly, I uploaded the screenshot of my post.
Sample dataset is attached.

Simulating the Dickey Fuller distribution in Stata

Dear all,

Hope you are all well.

Can I ask if there is a way of simulating the Dickey Fuller distribution in Stata?

Best

stuck with nested loops

I am trying to select 10 controls per case (using risk-set sampling) matched by general practice. I used a loop for that as shown below, but I need another loop within to ensure that all controls fulfil eligibility criteria. So, after running the sttocc command, I am struggling to find the syntaxes for fitting in another loop to REMOVE all non-eligible controls while repeating the iterations UNTIL 10 controls are selected per case. The eligibility criteria include age at index date should not be less than 15 years and the transfer out of practice should not be before the index date.

These are the commands I have so far:

set more off
use cohort, clear
set seed 404507 //a random number, to keep for replication

forval i=1/777 { // 777 is a maximum practice id,
use cohort, clear
keep if pracid==`i'
qui tab suicide
if r(r)==2 {
sttocc, n(10) /* NB- i have inserted a loop here as I will demonstrate below */
save temp/matched`i' //, replace
}
}

use temp/matched1, clear
forval i=2/777 {
capture append using temp/matched`i'
save matched, replace
}

I tried to fit in this loop to rule out noneligible controls and to keep the loop going until there are 10 controls per case, but I keep getting "variable clusters already defined". I tried many other ways around that but non seem to be working. I would truly appreciate your help.

forval d=1/11 { /*I am trying to tell stata to repeat the iterations 11 times to get a total of 1 case + 10 controls= 11 */
by _set: generate clusters= _n /*trying to get the total number of cases and controls in each set*/
keep if clusters== `d'
format index dobirth transferoutdate %d
keep if (index- dobirth) >= 5475 /*age in days */
keep if transferoutdate > index

}

Coarsened Exact Matching in unbalanced panel data

I am running Coarsened Exact Matching (CEM) on a highly unbalanced panel data (with 26 periods: 13 pre-shock and 13 post-shock), for a diff-in-diff analysis with fixed effect.
1. Is it flawed if I match treated and control groups in *both* pre & post shock periods (rather than just pre-shock periods) with a set of covariates?
2. In general, I force Stata to use the non-coarsened version of period *period(#0)* in my CEM command, so the matching occurs exactly within each period of the panel. Obviously, individual A may be matched with B in t1, but with C in t2. Does it sound fine?
3. After dropping unmatched observations, may I use my diff-in-diff analysis without using the weights. Meaning, I use CEM just to make the control and treated groups comparable (reducing the noise).

Many thanks

Oaxaca command - Yun Decomposition

Hi there,

I am trying to do Yun's decomposition for a Probit regression using the Oaxaca command (by Ben Jann). I'm having some trouble with weighting though.

When I use the following command:

Code:

svyset cluster [pweight=pweight_cross], strata(dc2001)
oaxaca lfps (education:normalize(educcat11-educcat15)) (age: normalize(agecat21 b.agecat22 agecat23 agecat24 agecat25 agecat26 agecat27 agecat28)) (married: normalize(b.maried11 maried12)) otherhhincpc if working_age==1 &amp; wave==2, by(male) probit pooled relax svy

I get the following comments:
"(model 2 has zero variance coefficients)
(pooled model has zero variance coefficients)"

And then I get no standard errors or t stats at all in my results

However, when I take away the "strata(dc2001)" from my svyset command, my Oaxaca command runs perfectly. I don't have problems with the above weight for any of the other commands I use though. I am very new to using the Oaxaca command, especially with a binary dependent variable. I have read the help file but am struggling to understand what I am doing wrong. Can anyone help me? TIA

Problem with egen rownonmiss, strok

Hello,

I am trying to create a variable that will count how many nonmissing responses across several string variables exist per observation. I'm using Stata 15.1 on a Mac.

This is what my code looks like:

Code:

egen activ_count = rownonmiss(activ_ac-activ_other), strok

This is what I am trying to get it to do:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str41 activ_ac str58 activ_soc str22 activ_other float activ_count
" "        " "      " "     0
" "        " "      " "     0
" "        " "      " "     0
" "        " "      " "     0
"Academic" " "      " "     1
"Academic" " "      " "     1
"Academic" "Social" "Other" 3
" "        " "      " "     0
"Academic" " "      " "     1
" "        " "      " "     0
end

But this is what the code is actually doing:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str41 activ_ac str58 activ_soc str22 activ_other float activ_count
" "        " "      " "     3
" "        " "      " "     3
" "        " "      " "     3
" "        " "      " "     3
"Academic" " "      " "     3
"Academic" " "      " "     3
"Academic" "Social" "Other" 3
" "        " "      " "     3
"Academic" " "      " "     3
" "        " "      " "     3
end

I saw a similar post that illustrates that this code should work. I can't figure out what I did differently to make it not work. What is the error that I am making?

Thank you!
Sheilagh

Panel data and count dependent variable

Hi all,
I have a panel data set of 18 years and 50 regions. My dependent variable is a count one, with over-dispersion. Hausman test indicates the use of fixed effects. Initially I thought that I should use Negative Binomial Fixed Effects (NBFE), but after reading some posts in the Forum I decided to use Poisson. So I use -xtpoisson [depvar] [indvar] , fe- command. My dependent variable is victims of car accidents. As I can understand I cannot use rates (for example victims per 1000 inhabits) because then I cannot use Poisson. However, all regions have different population. Is it right to have just the number of car accident victims? If not how can I engage victims and population of each region in analysis, in order to have proper and strong results?
Thank you in advance.

New version of xcontract on SSC

Thanks as always to Kit Baum, a new version of the xcontract package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of xcontract.

The xcontract package is described as below on my website. The new version now allows aweights, pweights and iweights, as well as fweights. All kinds of weights are treated in the same way. This allows users to weight the frequencies and percentages by enormous weights which have to be of type double because they are too large to be of type long. I would like to thank Kit Baum for alerting me to this problem.

Best wishes

Roger

--------------------------------------------------------------------------------------
package xcontract from http://www.rogernewsonresources.org.uk/stata16
--------------------------------------------------------------------------------------

TITLE
xcontract: Create dataset of variable combinations with frequencies and percents

DESCRIPTION/AUTHOR(S)
xcontract is an extended version of contract. It creates an output
data set with 1 observation per combination of values of the
variables in varlist and data on the frequencies and percents of
those combinations of values in the existing data set, and,
optionally, the cumulative frequencies and percents of those
combinations. If the by() option is used, then the output data set
has one observation per combination of values of the varlist
variables per by-group, and percents are calculated within each
by-group. The output data set created by xcontract may be listed to
the Stata log, or saved to a data frame, or saved to a disk file, or
written to the memory (overwriting any pre-existing data set).

Author: Roger Newson
Distribution-Date: 28december2020
Stata-Version: 16

INSTALLATION FILES (click here to install)
xcontract.ado
xcontract.sthlp
--------------------------------------------------------------------------------------
(click here to return to the previous screen)

Problems with merge

Hi there,

I was trying to merge two datasets (code:joinby), however it seems that stata cannot identify my key variable.
The master dataset looks as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str12 inventor_id int nodes float(_degree dg_central)
2 "4677061_3" 455  .05947137 .04970777
3 "4891185_1"  16         .6  .2857143
4 "5281603_1" 662 .031770047 .02461376
4 "3956484_2" 662 .031770047 .02461376
5 "5321008_2"  11         .4 .24444444
end

The using dataset as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 inventor_id float(id kd_4)
"4013665-2" 1934 .2738613
"4381297-1" 1934 .3535534
"4464380-1" 1934 .3535534
"4571404-1" 1934 .4950738
"4985433-4" 1934        0
end

The merged results are:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 inventor_id float(id kd_4) byte _merge int nodes float(_degree dg_central)
"4677061_3" 2 . 2 455  .05947137 .04970777
"4891185_1" 3 . 2  16         .6  .2857143
"3956484_2" 4 . 2 662 .031770047 .02461376
"5281603_1" 4 . 2 662 .031770047 .02461376
"5124314_1" 5 . 2  11         .4 .24444444
end
label values _merge __MERGE
label def __MERGE 2 "only in using data", modify

The code is:

Code:

joinby id inventor_id using " `temp1' ", unm(m)

The matched results include only obervations in master data, non-of the observations in using data has been matched. I'm pretty sure that the id and inventor_id in master dataset are matched with the observations in using data (and inventor_ids in master dataset are a small proportion of observations from using data). It seems that my inventor_id in both datasets are not identical so that they cannot be matched. I further tried

Code:

replace inventor_id = trim(itrim(inventor_id))

for both datasets, but it does not help. Also, you may suggest

Code:

destring inventor_id, replace
tostring inventor_id

I cannot do that because the variable inventor_id includes both numeric and non-numberic characters.

Any ideas will be highly appreciated.

fuzzy fsQCA bestfit displaying few cases

Hello Statalist,

I am conducting a fuzzy-set QCA using the 'fuzzy' package written by Longest & Vaisey (2008). I am looking at the following outcome variable:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float A
0
0
1
1
1
0
0
0
1
1
1
1
0
0
0
1
1
0
0
0
0
0
1
1
0
1
1
1
0
0
0
1
0
0
1
0
0
0
1
1
1
1
0
0
1
0
0
1
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
1
1
1
0
1
1
1
0
0
1
1
0
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
0
0
1
1
1
1
0
end
label values A A
label def A 0 "Does not provided extended maternity leave", modify
label def A 1 "Provides extended maternity leave", modify

This outcome variable has 65 observations that take the value of 1:

Code:

. tab A, nolab

   Extended |
  maternity |
      leave |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         39       37.50       37.50
          1 |         65       62.50      100.00
------------+-----------------------------------
      Total |        104      100.00

In my fsQCA I am examining the following five conditions:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float S
0
1
1
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
1
1
0
1
0
1
1
1
1
1
0
0
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
1
0
0
1
0
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
1
1
1
0
0
0
0
1
end
label values S S
label def S 0 "California", modify
label def S 1 "New York", modify

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float O
.67
.33
.67
.33
 .5
.33
  1
 .5
.33
.67
  1
 .5
 .5
.33
 .5
.33
 .5
.33
.33
 .5
 .5
  1
 .5
 .5
 .5
.33
.67
 .5
.33
 .5
.33
 .5
 .5
 .5
.33
  1
.67
 .5
 .5
.67
 .5
.67
.67
.67
.33
.67
  1
 .5
.33
.67
 .5
 .5
.67
  1
.33
  1
  1
 .5
.67
  1
  1
 .5
 .5
  1
.33
  1
 .5
  1
  1
  1
.33
.67
.67
 .5
 .5
 .5
 .5
.67
.33
 .5
  1
  1
 .5
.33
.67
 .5
  1
.67
.67
.67
 .5
.33
 .5
  1
.67
.67
.33
.67
.67
  1
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float F
 1
.5
.5
 1
.5
 1
 1
.5
.5
 0
.5
.5
.5
 0
.5
.5
.5
.5
 0
 0
 0
.5
.5
.5
.5
.5
 1
 0
.5
.5
.5
 0
.5
 0
 0
.5
.5
.5
 0
.5
.5
 1
.5
.5
.5
 0
.5
.5
.5
 1
 1
 0
.5
.5
 0
.5
 0
 1
.5
.5
 0
.5
.5
.5
 1
.5
.5
.5
.5
 1
.5
.5
 1
 1
 1
.5
.5
.5
 0
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
 1
 0
.5
.5
 1
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float T
.
1
1
1
0
1
1
0
1
1
1
0
1
1
1
1
0
1
1
.
1
1
1
1
0
1
0
1
1
1
0
0
1
1
0
1
1
0
0
1
1
0
1
0
1
1
1
1
0
1
0
0
1
1
1
1
0
0
1
1
0
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
0
1
1
1
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
1
1
end
label values T tightness
label def tightness 0 "not tight (high skills easy to fill)", modify
label def tightness 1 "tight (high skills hard to fill)", modify

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float I
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
0
0
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
end
label values I I
label def I 0 "Mixed Industry Skill-Profile", modify
label def I 1 "High-General Industry Skill-Profile", modify

In a first step I am trying to assess which configurations of conditions contain the greatest number of cases. To this end, I used the following code:

Code:

fuzzy A S O F T I
tabulate bestfit, sort

The result I get is

Code:

. fuzzy A S O F T I

. tabulate bestfit, sort 

    bestfit |      Freq.     Percent        Cum.
------------+-----------------------------------
      sOFTI |          4       16.67       16.67
      SOFTI |          2        8.33       25.00
      SoFTI |          2        8.33       33.33
      sOfti |          2        8.33       41.67
      soFTi |          2        8.33       50.00
      sofTI |          2        8.33       58.33
      sofTi |          2        8.33       66.67
      softI |          2        8.33       75.00
      SOFtI |          1        4.17       79.17
      SOfTI |          1        4.17       83.33
      SofTi |          1        4.17       87.50
      sOFti |          1        4.17       91.67
      sOfTi |          1        4.17       95.83
      soFTI |          1        4.17      100.00
------------+-----------------------------------
      Total |         24      100.00

If I understand the fuzzy bestfit command correctly, I should be getting a table with a count of 65 cases, correct? I do not understand why the bestfit table is only displaying 24 cases.

I am aware that Longest & Vaisey mention that cases scoring 0.5 on all individual predictors sets will not appear in the bestfit table because they belong equally to all configurations. But given that only two of my conditions are calibrated to allow a score of 0.5, there can't be a case in which all predictors have a score of 0.5.

Furthermore, I am also aware that a similar question was previously posted and left unanswered ten years ago. However, I was hoping for better luck at this moment in time.

How to use tabstat's "format" option with estpost? Getting Error

Hi,

I have a time variable in Stata internal form. I am trying to calculate it's summary statistics by group and then exporting it to latex. However, when I export it, I want the figures to be in Human Readable Format and not Stata internal form. Usually, the "format" option of summarize/tabstat gives us human readable format however when used with estpost, I get the following error - "option format not allowed".

This is my code:

estpost tabstat total_time, by(group) c(s) stat(n mean min max), format
esttab . using "TotalTime.tex", cells("n mean min max") label nodepvar replace

How can I solve this?

Panel 2sls with multiple interations of endogeneous variables

I´m using Stata 14 with Windows 10 OS.

I have 2 endogenous and 8 exogenous variables. I need to run regressions with double and triple interactions (triple interactions have both endogenous). I'm quite lost with xtivreg syntax and I'd appreciate your help. I'm sending bellow a list of variables and a sample of my data.

Dependent variable: price
exogenous variables: NFVA, FVA1, FVA23, NFVL, FVL1, FVL23, NI, IMR2
Endogenous variables: ADR, GOV
Instruments: firm's beta, net total debt, lagged exogenous variables
codcvm is the id variable and ano is time variable as year

My model:

Code:

 Price = b0 + b1 GOV + b2 ADR + b3 NFVA + b4 FVA1 + b5 FVA1*GOV + b6 FVA1*ADR + b7 FVA1*GOV*ADR + b8 FVA23 + b9 FVA23*GOV + b10 FVA23*ADR + b11 FVA23*GOV*ADR + b12 NFVL + b13 FVL1 + b14 FVL1*GOV + b15 FVL1*ADR + b16 FVL1*GOV*ADR +b17 FVL23 + b18 FVL23*GOV + b19 FVL23*ADR + b20 FVL23*GOV*ADR + b21 NI + b22 IMR2 + error

Data sample

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(codcvm ano) float(price GOV ADR FVA1 FVL1 NFVA NFVL NI) double(FVA23 FVL23) float IMR2
2095 2010       .01448 1 0           0           0        1    .862694   .020599604                    0                    0 1.8849287
2095 2011       .01231 1 0           0           0        1   .8573273   .022456584                    0                    0  1.895388
2095 2012       .01412 1 0           0           0        1   .8754389     .0168076                    0                    0 1.9299603
2095 2013       .01226 1 0           0           0        1   .8889021     .0155427                    0                    0 1.9609858
2095 2014       .01309 1 0   .09594385 .0003752093 .8345584   .8400778   .014977852   .06949780788272619    .0540262833237648   1.88543
2095 2015       .00776 1 0   .12627429    .0140314 .8090795   .7870296   .015319045   .06464618910104036   .09528721123933792  1.890587
2326 2010        .0434 0 1   .02898379 .0016462464 .9526519   .4137668    .17852733  .018364297226071358  .013761443085968494 1.7750353
2326 2011        .0546 0 1 .0025358796 .0032538774  .984683   .4261768     .2306608  .012781110592186451  .015483875758945942 1.7935365
2326 2012        .0837 0 1  .007191408 .0027928664 .9877713   .4083186     .1924044  .005037299823015928  .055951569229364395  1.822398
2326 2013       .01732 0 1  .007366527  .003353699  .986896   .3252531    .16599345  .005737526807934046   .04759433213621378 1.8768967
2326 2014       .01622 0 1  .012870845  .006171739 .9778092   .3422747     .1713539  .009320004843175411   .06630658358335495 1.8861154
2326 2015       .01785 0 1  .006843655  .003721524 .9802735   .3489571    .14282191  .012882862240076065   .11134995147585869 1.8859285
1931 2010       .04599 0 0           0           0        1   .8327665    .10584348                    0                    0 1.8634926
1931 2011        .0455 0 0           0           0        1   .8239585    .13186201                    0                    0 1.8739742
1931 2012       .01457 0 0   .01016879           0 .8213254   .8377029 -.0008097625   .16850577294826508                    0  1.713398
1931 2013       .00924 0 0  .005542452           0 .7911574   .8631936    .01322891    .2033001333475113                    0  1.707934
1931 2014       .00934 0 0  .009881306           0 .7626232   .8847253  -.014543248   .22749550640583038                    0 1.6866566
1931 2015       .00566 0 0   .01057423           0   .81615   .8947548   .003912983   .17327584326267242                    0  1.751985
2003 2010         .011 1 1           0           0 .8631935   .3701721   -.00645372   .13680653274059296                    0  1.630657
2003 2011       .00966 1 1           0           0 .8851542   .1969214    .02274529   .11484577506780624  .005441516172140837  1.641963
2003 2012          .01 1 1           0           0 .7690469  .19807696   .018890204   .23095311224460602   .01597452349960804  1.541664
2003 2013       .00945 1 1           0           0 .6390778   .0698412   -.00560241    .3609221875667572    .1941925436258316 1.4925612
2003 2014       .00909 1 1           0           0 .6859279  .02397196    .00341979    .3140721619129181   .23323412239551544  1.569267
2003 2015       .01103 1 1           0           0 .4966458 .032631684    .05016544    .5033541917800903    .1187484860420227  1.286348
1045 2010       .01079 1 0           0           0        1    .416686    .13635743                    0                    0 1.8154325
1045 2011        .0124 1 0           0           0        1   .3950234    .12586007                    0                    0  1.819194
1045 2012        .0151 1 0           0           0        1   .3959184     .1021564                    0                    0   1.84827
1045 2013        .0148 1 0           0           0        1   .4543546    .09221073                    0                    0   1.88541
1045 2014       .00725 1 0           0           0        1   .4521624    .07682598                    0                    0 1.8962877
1045 2015       .00712 1 0           0           0        1   .5112147    .07235716                    0                    0 1.9008073
2130 2010        .0137 1 0   .07954594           0 .7138248   .4621486    .02508075   .20662923529744148 .0019466136582195759  1.530377
2130 2011       .01425 1 0   .07763467           0  .728938  .46265805    .04234599   .19342736341059208 .0002643049228936434 1.5660287
2130 2012       .02446 1 0   .06391358           0 .8743839   .5524928   .032334138  .061702530831098557  .000745051889680326 1.7797382
2130 2013   .018299999 1 0  .013620352           0 .9630429   .5852539   .015255044  .023336702957749367 .0009420430869795382 1.8704578
2130 2014   .016379999 1 0  .023937495           0 .9206727   .5687432    .04230913   .05538978800177574 .0011597869452089071 1.8370527
2130 2015       .01088 1 0   .02403402           0 .9706892   .5563391   .032069094   .00527676846832037  .001452660420909524  1.893863
2149 2010            . 1 0           .           .        .          .            .                    .                    .         .
2149 2011            . 1 0           .           .        .          .            .                    .                    .         .
2149 2012            . 1 0           .           .        .          .            .                    .                    .         .
2149 2013       .01625 1 0   .10492772           0 .3787082   .6930389   .034670528    .5163640975952148                    0  1.296765
2149 2014       .01764 1 0   .07013564           0 .4265987    .725217    .04063374    .5032656788825989                    0  1.327494
2149 2015       .01315 1 0   .07251922           0 .4709961   .7492847    .02089253   .45648470520973206                    0 1.3801236
2205 2010       .02516 1 0           0           0        1   .5930232     .1012331                    0  .007694688625633717 1.8390068
2205 2011        .0171 1 0           0           0 .9984778   .6473874    .07296435 .0015222402289509773                    0 1.8556956
2205 2012       .03253 1 0           0           0 .9923194   .5725522    .09420037  .007680611684918404 .0017286088550463319 1.8588928
2205 2013        .0186 1 0           0           0  .995603   .5681178   .033186056  .004396964330226183 .0019135798793286085 1.8945718
2205 2014        .0145 1 0           0           0 .9925854   .6106304   .017207507  .007414636202156544                    0 1.9062322
2205 2015       .00485 1 0           0           0 .9924064   .6049464   -.01262793  .007593564689159393                    0 1.9035524
2324 2010            . 1 0           .           .        .          .            .                    .                    .         .
2324 2011            . 1 0           .           .        .          .            .                    .                    .         .
2324 2012            . 1 0           .           .        .          .            .                    .                    .         .
2324 2013        .0212 1 0           0           0        1   .2026117    .04424335                    0   .24000747501850128 1.9750292
2324 2014       .03535 1 0           0           0        1  .22638027     .1746635                    0   .14200007915496826  1.927466
2324 2015        .0138 1 0           0           0        1  .19161797    .05460474                    0   .26225975155830383 1.9936877
1977 2010       .05999 0 0           0           0        1   .7745959   .063102864                    0                    0 1.8395628
1977 2011         .061 0 0           0           0        1   .7480395    .07286048                    0                    0 1.8444486
1977 2012       .01895 0 0           0           0        1   .7342969    .06672255                    0                    0 1.9044997
1977 2013       .01905 0 0           0           0        1   .7449355   .063283935                    0                    0 1.9333266
1977 2014        .0124 0 0           0           0        1    .778156    .04761028                    0                    0 1.9501094
1977 2015       .00965 0 0           0           0        1   .7773314   .014813367                    0                    0 1.9433042
2234 2010            . 1 0           .           .        .          .            .                    .                    .         .
2234 2011       .02309 1 0           0           0        1   .2478928     .1794124                    0                    0 1.7891186
2234 2012       .03943 1 0           0           0        1   .2869849     .1521762                    0                    0  1.815005
2234 2013       .02975 1 0           0           0        1  .26897734    .15706825                    0                    0 1.8459507
2234 2014       .02696 1 0           0           0        1  .27647647    .14155772                    0                    0 1.8551208
2234 2015         .021 1 0           0           0        1   .2764138    .14012913                    0                    0 1.8532676
1542 2010       .00032 0 0           0           0        1   49.10069   -1.5720824                    0                    0  11.81471
1542 2011       .00018 0 0           0           0        1   45.21605    -.9650206                    0                    0 11.048302
1542 2012       .00008 0 0           0           0        1   46.46199   -3.5730994                    0                    0 11.115204
1542 2013       .00012 0 0           0           0        1  13.884973    -3.011132                    0                    0 4.2892194
1542 2014       .00005 0 0           0           0        1  14.830156   -1.7937608                    0                    0  4.600495
1542 2015 .00027000002 0 0           0           0        1   14.83882  -.008665511                    0                    0 4.7537165
  70 2010        .0123 0 0           0           0        1  .27834797     .5145227                    0                    0 1.8185687
  70 2011        .0081 0 0           0           0        1  .21150658   -.03864317                    0                    0 1.7763908
  70 2012       .00678 0 0           0           0        1  .05755222   -.03805843                    0                    0 1.7820703
  70 2013        .0052 0 0           0           0        1  .12239589   -.05820633                    0                    0 1.8201787
  70 2014       .00524 0 0           0           0        1   .2125779    .10753497                    0                    0  1.856399
  70 2015       .00304 0 0           0           0        1   .1617525  -.018043786                    0                    0 1.8329862
 601 2010       .00271 0 0           0           0        1   .9352943    -.1523916                    0                    0  1.893153
 601 2011       .00308 0 0           0           0        1   .8917627    .05648189                    0                    0 1.9110053
 601 2012       .00435 0 0           0           0        1   .7922776    .10369716                    0                    0  1.928566
 601 2013        .0102 0 0           0           0        1   .6084999    .22277114                    0                    0 1.9271795
 601 2014        .0107 0 0   .23880628   .01730539 .7611938   .4911647    .19340244                    0                    0 1.8600703
 601 2015        .0075 0 0   .10072467  .016168384 .8992754    .466061    .10052438                    0                    0 1.8665117
  92 2010 .00054000004 0 0           0           0        1   .7714935   .016800253                    0                    0 1.8779932
  92 2011        .0004 0 0           0           0        1   .8040627  .0079589905                    0                    0 1.8930835
  92 2012       .00037 0 0           0           0        1   .8069783   .015777923                    0                    0  1.927242
  92 2013       .00033 0 0           0           0        1   .8565773    .01610735                    0                    0 1.9637246
  92 2014       .00025 0 0           0           0        1   .7830876    .01500342                    0                    0 1.9572564
  92 2015       .00025 0 0           0           0        1   .8407562    .02060466                    0                    0  1.962252
 102 2010       .03142 1 0   .12019207 .0026019185 .8431765   .9276564   .014427473  .036631352035328746 .0075592788780340925  1.822465
 102 2011        .0237 1 0   .10823505  .002838721 .8440334   .9331109    .01235795    .0477315938915126  .004968998847743933 1.8252867
 102 2012        .0256 1 0    .0993982           0 .8509784   .9357227   .010608663   .04962341347709298  .007348516053788501 1.8596928
 102 2013        .0244 1 0   .08332316           0 .8564683   .9409929   .012085094   .06020851212088019  .005685916170477867 1.8772106
 102 2014       .02377 1 0   .03970604           0 .9261969   .9439889   .007823253   .03409701958298683   .00208375439979136  1.921996
 102 2015       .01474 1 0   .04479542           0 .9190507   .9414504   .010277114  .036153946071863174  .002588964067399502 1.9191633
  90 2010       .03265 1 1           0           0        1   .9246368   .015720649                    0                    0 1.8828037
  90 2011       .03075 1 1           0           0        1   .9270134   .014481674                    0                    0   1.89424
  90 2012   .035169996 1 1           0           0        1   .9203184   .012946588                    0                    0 1.9226907
  90 2013       .02909 1 1           0           0        1   .9218845   .013225975                    0                    0 1.9546788
end
format %ty ano