Tuesday, August 31, 2021

Counting the number of significant coefficients in a linear regression

Is there any simple way by which I can count the number of only significant coefficients in a regression?
I have 1400 village dummies in a regression, a lot of them are significant, and many insignificant. And i want to obtain a count of atleast one of them. I know matrix computations are also there, but i was looking for a simpler way to do so.

Thanks!
Shweta

Font for Tables and Figures

I am using kpfonts in LaTeX and would like to have the same font in figures and tables that I produce in stata. I tried to change the font in stata but kpfonts is not available in stata. Is there any way around it. Please don't tell me I should change the font in LaTeX.

compare variables in two years

Deat Statalist,

I have a set of variables from 2010-2012, I want to compare which companies are new to the dataset (i.e., not in the data in 2010, but in the data in 2011; not in the data in 2011, but in the data in 2012).

Just give an example:

Symbol Year
A 2010
B 2010
C 2010
D 2010
E 2010
A 2011
C 2011
D 2011
F 2011
G 2011
B 2012
C 2012
G 2012
H 2012
I 2012

Thanks a lot in advance!

egen rowmax produces additional decimals

Hello. I'm working with US Census data in Stata 15.1. Each observation is a geographic area and the variables are the percentage of the population placed in several race/ethnicity categories. The percentages are in the format 12.34. I want a new variable with the percentage from the category with the highest percentage. I'm using the following command:
Code:
egen rowmax=rowmax(pctapi pctblack pctaian pctwhite pct2prace pcthispanic)
However, if "pctwhite," for instance, is 55.19, "rowmax" = 55.189999. This creates a problem when I want to create a variable that contains "pctwhite" as the value to indicate the race/ethnicity category that "rowmax" came from. Hope that makes sense. I don't know if I
Code:
egen rowmax
has a fix for this. I tried rounding manipulations with
Code:
int
but that seemed to produce a few records with "rowmax" off by 0.01 from the source variable. Thank you for any suggestions.

Bayesian seemingly unrelated regression

Hi everyone,

I intend to estimate a system of equations (input demand functions) using Bayesian seemingly unrelated regression (SUR) approach.
I can implement the traditional SUR model using "sureg" command, but I don't know the code how to implement Bayesian SUR approach.
Could you please help me?

Thank you!
Best regards,
Phuc

two way scatter plot, command is unrecognized

I tried to graph a two way scatter plot, here is my code:
tw scatter or id, xlabel(1(1)`max', valuelabel angle(45) labsize(*0.5))
xtitle("") ytitle("Odds Ratio") yscale(log) || ///
rcap lb ub id, legend(off)

but i got results showed:
command xtitle is unrecognized
command rcap is unrecognized

may i know how to solve it? Thanks.

Interaction multiple levels categoric variable

Hello everybody.

I'm trying to test if a variable can be considered an effect modification in a svy poisson model.
I planned to evaluate the p-value of the interaction terms in a model to decide if age is an effect modifier.
However, I'm dealing with an interaction between a three-level categoric variable (Age) and a binary exposure (Binge).
Here are the results of the interaction terms:

---------------------------------------------------------------------------------------
| Linearized
NUP | IRR Std. Err. t P>|t| [95% Conf. Interval]
----------------------+----------------------------------------------------------------

age#binge |
25 0 | 1.052783 .0754358 0.72 0.473 .9148233 1.211548
40 0 | 1.095433 .074272 1.34 0.179 .9590996 1.251146
60 0 | 1.158218 .0832244 2.04 0.041 1.006044 1.333411

Look that just one term is significative. Should I consider that age is an effect modifier?
Another option for testing interaction significance would be to use LR test or Deviance but as far as I know this is not avaiable in svy commands. Am I correct?

uselabel: how to also export values without a label

Hi Statalist,

I was trying to output a file which showcase all values (regardless of whether that value has a label) of a variable.

Here are the codes I used:

use data, clear
uselabel, clear
save label, replace

however, this output only shows the values that have labels. There are some numeric variables in my data, for example: variable urban, it could have the following values
values label
1 urban
2 rural
99 (NO LABEL)


I wonder how could I also output the value "99"? Any suggestions are much appreciated!

Thank you in advance!


Hetprobit with factor variable

Heteroskedastic probit estimation with a factor variable gives (slightly) different results than with equivalent 0-1 indicator variables. Differences arise in coefficient estimates, log likelihood, and homoskedasticity test chi2. This is in Stata 16.1.

A factor variable named w equals 0, 1, 2, or 3, so it can be used in estimation with i.w , or with the equivalent indicator variables w1 = (w==1) , w2 = (w==2), and w3 = (w==3) . A binary outcome is y, and another continuous variable is x. I estimate:
Code:
hetprobit y x i.w, het(x i.w)
I then replace the first usage of i.w with the equivalent indicator variables:
Code:
hetprobit y x w1 w2 w3, het(x i.w)
The estimates converge slightly differently in the two different cases. I am using Stata MP 16.1 revision 8 Jul 2021 on Mac.

The difference happens for i.w in the main term, but seemingly not in the heteroskedasticity term. Putting "set seed 58233" just before both calls to hetprobit does not make the results the same (and indeed the results of hetprobit do not seem to depend on random numbers). Probit models without the heteroskedasticity term seem to converge to the same result regardless whether "i.w" or "w1 w2 w3" is used; I'm only seeing a difference for hetprobit.

Does anyone see a reason why this would happen?

Below I show contiguous Stata output that (1) documents key aspects of the data, and (2) demonstrates the difference in estimates. I've suppressed the ML logs with nolog options, but the iterations show 70 full model iterations using "i.w" versus 65 using "w1 w2 w3" and with "(not concave)" displayed at different iteration numbers. As I said the differences are small, but noticeable.

Code:
. di _N
250

. su , sep(0)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |        250        .288    .4537395          0          1
           x |        250    4.441809    .7600443   .8457781   5.457211
           w |        250        .928    1.057967          0          3
          w1 |        250        .144     .351794          0          1
          w2 |        250         .26    .4395142          0          1
          w3 |        250        .088    .2838632          0          1

. tab w, mi

          w |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        127       50.80       50.80
          1 |         36       14.40       65.20
          2 |         65       26.00       91.20
          3 |         22        8.80      100.00
------------+-----------------------------------
      Total |        250      100.00

. assert w1==(w==1) & w2==(w==2) & w3==(w==3)  // The omitted category is w==0.

. hetprobit y x i.w, het(x i.w) nolog

Heteroskedastic probit model                    Number of obs     =        250
                                                Zero outcomes     =        178
                                                Nonzero outcomes  =         72

                                                Wald chi2(4)      =       1.97
Log likelihood = -106.9522                      Prob > chi2       =     0.7413

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
           x |   .4855019   .7097267     0.68   0.494    -.9055368    1.876541
             |
           w |
          1  |     3.1424   4.608419     0.68   0.495    -5.889936    12.17473
          2  |   3.590161   4.310631     0.83   0.405    -4.858521    12.03884
          3  |  -2234.108   24823.36    -0.09   0.928       -50887    46418.78
             |
       _cons |  -5.399992   3.985377    -1.35   0.175    -13.21119    2.411204
-------------+----------------------------------------------------------------
lnsigma      |
           x |   .2204246   .3146317     0.70   0.484    -.3962421    .8370914
             |
           w |
          1  |  -1.411461   2.456378    -0.57   0.566    -6.225874    3.402952
          2  |  -1.073397   2.581648    -0.42   0.678    -6.133333     3.98654
          3  |   6.172341   11.19192     0.55   0.581    -15.76341    28.10809
------------------------------------------------------------------------------
LR test of lnsigma=0: chi2(4) = 1.86                      Prob > chi2 = 0.7616

. di %16.15g e(ll)
-106.95224858907

. hetprobit y x w1 w2 w3, het(x i.w) nolog

Heteroskedastic probit model                    Number of obs     =        250
                                                Zero outcomes     =        178
                                                Nonzero outcomes  =         72

                                                Wald chi2(4)      =       1.97
Log likelihood = -106.9523                      Prob > chi2       =     0.7415

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
           x |   .4854962     .70974     0.68   0.494    -.9055687    1.876561
          w1 |   3.142395    4.60851     0.68   0.495    -5.890119    12.17491
          w2 |   3.590151   4.310695     0.83   0.405    -4.858656    12.03896
          w3 |  -2148.256   24828.55    -0.09   0.931    -50811.32    46514.81
       _cons |  -5.399961   3.985342    -1.35   0.175    -13.21109    2.411167
-------------+----------------------------------------------------------------
lnsigma      |
           x |   .2204242   .3146381     0.70   0.484    -.3962551    .8371035
             |
           w |
          1  |  -1.411471   2.456464    -0.57   0.566    -6.226051    3.403109
          2  |  -1.073407   2.581738    -0.42   0.678     -6.13352    3.986707
          3  |   6.133164   11.63331     0.53   0.598     -16.6677    28.93402
------------------------------------------------------------------------------
LR test of lnsigma=0: chi2(4) = 1.86                      Prob > chi2 = 0.7616

. di %16.15g e(ll)
-106.95225620844

qui being ignored in xtgcause

When I run
Code:
qui xtgcause y z , lags(2)
it does not run entirely quietly but displays yzyzid. I know that "as error" overrides quietly, but what else might cause that?

Combining Graphs using distplot

Dear Statalist,

I was hoping to get some help on graph output. So I am reading this paper that shows what looks like a semi outdated code to draw up some graphs. I have done my best to replicate it, but cannot get the plots to stack on top of one another like in the example.


Code:
distplot dist_ind [w=s_o_freq], recast(connected) ylabel(, angle(h)) midpoint by(stazi_ind) l2(logit scale) reverse trscale(logit(@)) legend(col(1) position(5) ring(0))

Help with labels on bar graph

I am trying to create a horizontal bar graph grouped by categories. I wanted to add two labels to the bars, one with percentages and one with the count. I have been able to generate these bar graphs individually, but I am not sure how to go about putting it all together.

Code:
graph hbar, over(Condensed_OutcomeMID) over(MID_Dur_Cat) ///
asyvars blabel(bar, format(%9.1f)) ///
name("Outcome_by_duration_per", replace) percentages
graph export Outcome_by_duration_per.jpg, replace

graph hbar (count),over(Condensed_OutcomeMID) over(MID_Dur_Cat) ///
asyvars blabel(bar, format(%9.1f) position(inside) color(gs16)) ///
name("Outcome_by_duration1", replace) 
graph export Outcome_by_duration1.jpg, replace
In the end, I get:

Array

Array

In the end, I'd like the x-axis to be the percentage out of the group, and to just add the count labels within the bars like in the second graph. Also, if anyone has recommendations about how to fix the text issues (with the group labels on the y axis and the legend), I would really appreciate it! I'm still pretty new to Stata so any help is appreciated.

How to print the help file in STATA with all the word retained?

Hi all,

When I try to print from STATA help file, I faced the problem of the word at the right part of the page being cut due to the width. Can I ask what do you normally do to print the help file with all the words being remained?

restricted mean survival time with stpm2_standsurv

Dear Paul Lambert,

I have encountered the following error message after attempting to use the rmst option in stpm2_standsurv:

. stpm2_standsurv, at1(a_base 0) at2(a_base 1) timevar(t_rmst36) atvar(rmst_a0 rmst_a1) rmst

option meansurvwt() not allowed
stata(): 3598 Stata returned error
RMST_calcstand(): - function returned error
rmst_stand(): - function returned error
<istmt>: - function returned error
r(3598);

I have updated the versions of both stpm2 and stpm2_standsurv:

. which stpm2_standsurv
c:\ado\plus\s\stpm2_standsurv.ado
*! version 1.1.2 12Jun2018

. which stpm2
c:\ado\plus\s\stpm2.ado
*! version 1.7.5 May2021

Aany advice please?

thank you
Bianca

What does "command ms_get_version is unrecognized" mean and how I should deal with that?

Today I am dealing with a did_imputation from Kirill, 2021, and I found the error "command ms_get_version is unrecognized" after running the code

Code:
did_imputation Y i t ei,autosample fe( TYPE2 yr) controls($controllist) maxit(30000) tol(0.001)
Could you please point out what is the problem and what I should do to solve it?

Mathematical operations within a string variable

Dears,

I have a dataset with data on pharmaceutical supplies with around 7million observations. One of the variables is on the pharmaceutical presentation, and I have around 1200 different types of presentations. I want to go from the pharmaceutical presentation to the total amount of substance per box
The variable is in string as follows and I want to multiply the first number (amount of substance per pill) per the number in the end of the string (number of pills in a box)

40 MG COM REV CT BL AL PLAS TRANS X 10
40 MG COM REV CT BL AL PLAS TRANS X 20
40 MG COM REV CT BL AL PLAS TRANS X 30
5 MG COM CT 2 BL AL PLAS INC X 10
5 MG COM CT BL AL / AL X 20
5 MG COM CT BL AL / AL X 30
5 MG COM CT BL AL PLAS BCO LEIT X 20
5 MG COM CT BL AL PLAS INC X 100
5 MG COM CT BL AL PLAS INC X 20
5 MG COM CT BL AL PLAS INC X 20
5 MG COM CT BL AL PLAS INC X 30
5 MG COM CT BL AL PLAS OPC X 20
5 MG COM CT BL AL PLAS OPC X 30

Problem 2: some of thee strings (less than 10%) have information in parenthesis that do not interest me in the end, i.e:
20 MG COM DISP CT 2 BL AL PLAS INC X 14 (PORT 344/98 L - C1)
25 MG CAP GEL DURA CT BL AL PLAS INC X 50 (EMB HOSP)

Do you have any suggestions on how to solve it?

Thanks so much

Multiple response analysis

Dear Experts

I have been trying to analyse COVID-19 related data where each subject submits reports and measurements of vitals. I need to classify patients based on the parameters recorded. Here is an example, dummy data:
ID reporting date heart rate Saturation respiratory rate
1233 15.Jul 98 96 16
1233 17.Jul 80 90 18
1233 20.Jul 75 95 19
1233 25.Jul 78 98 25
I understand the data is in long format. So if I try to define COVID with moderate severity based on a combination of heart rate (>90 at any given day), saturation (<95 at any given day), and respiratory rate (>20 at any given day), the end result is that this patient 1233 is not considered moderately severe because Stata applies the condition on row-basis. Indeed, I tried to reshape the data to wide format, but I ended with thousands of variables because reporting dates vary across subjects, so I could not use it along with the ID.

I would simply need Stata to look, within each ID, for any responses fulfilling the condition, regardless of the date, and without the need to reshape the data. Is this even possible?

I appreciate your help

Omar

Counting those in the same category as you, by group

Dear all,

Please take a look at the attached screenshot of a dummy dataset. How do I generate a variable Relatives, which tells me for every individual in a household, how many others in that household have the same colour as that individual?

For example, in household A, there are 2 red people, 2 blue people, and 2 pink people. Therefore individual A1 has 1 other relative who has the same colour as him - red. Similarly, individual A3 has 1 other relative who has the same colour as him - blue. And similarly, in household B, individual B4 is the only pink person, therefore the number of relatives who have the same colour as him is 0.

The actual data set has a lot more colours and a lot more households.

Thanks in advance!

Identifying events within a future time window

Dear Statalist users -- Assume that we have the following toy example:

Code:
input firm_id year event event_5y
1    1979    0    1
1    1980    1    0
1    1981    0    0
1    1982    0    0
1    1983    0    0
1    1986    0    0
1    1987    0    0
1    1988    0    0
1    1989    0    0
1    1990    0    0
1    1991    0    0
1    1992    0    0
1    1993    0    0
1    1994    0    0
1    1995    0    0
1    1996    0    0
1    1997    0    0
1    1998    0    0
1    1999    0    0
1    2000    0    0
1    2001    0    0
1    2002    0    0
1    2003    0    1
1    2004    0    1
1    2005    0    1
1    2006    0    1
1    2007    0    1
1    2008    1    1
1    2009    0    1
1    2010    1    1
1    2011    0    0
1    2012    0    0
1    2013    0    0
1    2014    0    0
1    2015    0    0
1    2016    0    0
end
Assume that this dataset consists of three main variables: firm_id, year, and event. I am wondering how one can create an indicator variable that takes value one under the following condition. For each point in the future, the indicator should take value one if there is an event within a five-year period from time t. The fourth variable in this example is what I expect to get based on what I described above.

Is there any way to do this succinctly in Stata?

putexcel problem with double replace and local macros

I think this is probably a bug report, but I'll just show what happens and let others decide. Mostly tested with Stata/SE 15.1 under windows but also tested a bit under Stata/MP 16.0 under Linux and seemed to behave similarly.

Code:
local a "week"
local b "year"

putexcel set test, sheet(foo, replace) replace 

// works OK if I leave off the first replace above
*putexcel set test, sheet(foo) replace 

// this works too
*putexcel set test, sheet(foo, replace) modify

putexcel a1 = ("`a'")   // only b1 shows up in excel
putexcel b1 = ("`b'")

*putexcel a1 = "foo"    // works OK if I use strings directly
*putexcel b1 = "bar"
In a nutshell, with the above code, only b1 has a value. I won't show all the variations I tried, but basically what seems to happen here is that if I try to output multiple cells with local macros, only the last one shows up in the spreadsheet itself.

Problem seems to happen only if:
* I use the replace option twice, for both the workbook and worksheet
* try to output multiple cells using local macros

This is not much of a problem for me going forward because upon further reflection I think the double replace specification here is probably redundant and it makes more sense to either:
* replace the workbook (which seems to imply replacing the sheet also?)
* modify the workbook and replace the worksheet

And both those ways seem to work fine. But this really confused me for a couple of hours as I sorted out what was causing the problem so I figured it was worth a post.

Moderating impact

Dear Statalist,
Actually, I am bit confused between the joint effect and the moderating impact. I mean I saw some people doing the following example:

Cancer = Smoker + Sport + Smoker*Sport.

So the code in STATA will be if am using panel data:

Code:
Xtologit Cancer i.Smoker##i.Sport
So, I believe the above is the joint effect.

But, what if I want to see the moderating impact of Sport on the relationship between Cancer and Smoker. How the model should look like and what is the code for this by using the above variables?

Many thanks in advance.

Model Specification with GSEM Stata including both multilevel and simple regression modelling?

Hello everyone! My research model includes both Level 1 (individual) and Level 2 (team) variables. Initially, the goal is to explore the determinants of innovative work behavior at the individual level, which is why I ran a multilevel analysis in Stata. The second part of the research has as dependent variables two types of innovation at the team level (exploratory and exploitative innovation).
Wanting to run an overall model I employed Gsem in stata 14. I understand that the second part of the research does not require multilevel analysis since my dependent variables are at the upper level (level 2). In order to connect the two strands of research I aggregate innovative work behaviour to the team level (after checking for rwg, ICC1,ICC2). But is this correct ? In other words, beyond aggregation how can I regress exploratory and exploitative innovation from a level 1 variable-here employee's innovative work behaviour-? Does anyone employed the latent variable approach proposed by Croon et al (2007)* with GSEM ? Below I attach the gsem model that I specify.
PS My sample consists of 299 employees nested within 103 teams (~3 people per team).

Your help is much appreciated.
*Croon, M. A., & van Veldhoven, M. J. P. M. (2007). Predicting grouplevel outcome variables from variables measured at the individual level: A latent variable multilevel model. Psychological Methods,Array 12, 45–57.

Create year variable from yearly-monthly data.

Hi all,

In the current dataset, I have yearly-monthly data and I would like to transform it into yearly data for each individual. I believe this is somewhat simply task, however it doesn't seem to work for me.

Thanks!

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long(lopnr Utbetmanad) double Belopp
276 201401 2823
277 201401 2021
325 201401 3476
334 201401 2186
351 201401 1517
357 201401  123
end

Creating variables with 500+ conditions

I am trying to map out school districts based on precinct numbers (which is present in both data sets that I am working with). The precinct numbers are contained in a different excel sheet that looks like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long precinct str36 name1
 50001 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50002 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50003 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50004 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50005 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50010 "LA CANADA UNIFIED SCHOOL"      
 50011 "LA CANADA UNIFIED SCHOOL"      
 50014 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50016 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50019 "LA CANADA UNIFIED SCHOOL"      
 50020 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50021 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50022 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50023 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50024 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50025 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50026 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50027 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50028 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50051 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50052 "LA CANADA UNIFIED SCHOOL"      
 50052 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50053 "LA CANADA UNIFIED SCHOOL"      
 50054 "LA CANADA UNIFIED SCHOOL"      
 50056 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50059 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50060 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50061 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50062 "ACTON-AGUA DULCE UNIF SCHOOL"  
 50063 "ACTON-AGUA DULCE UNIF SCHOOL"  
 70001 "LAS VIRGENES UNIFIED SCHOOL"    
 70002 "LAS VIRGENES UNIFIED SCHOOL"    
 70004 "LAS VIRGENES UNIFIED SCHOOL"    
 70006 "LAS VIRGENES UNIFIED SCHOOL"    
 70007 "LAS VIRGENES UNIFIED SCHOOL"    
 70008 "LAS VIRGENES UNIFIED SCHOOL"    
 70027 "LAS VIRGENES UNIFIED SCHOOL"    
 70040 "LAS VIRGENES UNIFIED SCHOOL"    
 70041 "LAS VIRGENES UNIFIED SCHOOL"    
 70041 "SANTA MONICA-MALIBU UNIF SCHOOL"
 70207 "LAS VIRGENES UNIFIED SCHOOL"    
 80001 "LAS VIRGENES UNIFIED SCHOOL"    
 80002 "LAS VIRGENES UNIFIED SCHOOL"    
 80003 "LAS VIRGENES UNIFIED SCHOOL"    
 80012 "LAS VIRGENES UNIFIED SCHOOL"    
 80021 "LAS VIRGENES UNIFIED SCHOOL"    
 80022 "LAS VIRGENES UNIFIED SCHOOL"    
 80025 "LAS VIRGENES UNIFIED SCHOOL"    
 80034 "LAS VIRGENES UNIFIED SCHOOL"    
 80038 "LAS VIRGENES UNIFIED SCHOOL"    
 80043 "LAS VIRGENES UNIFIED SCHOOL"    
 80047 "LAS VIRGENES UNIFIED SCHOOL"    
 80050 "LAS VIRGENES UNIFIED SCHOOL"    
 80051 "LAS VIRGENES UNIFIED SCHOOL"    
 80052 "LAS VIRGENES UNIFIED SCHOOL"    
 80053 "LAS VIRGENES UNIFIED SCHOOL"    
 80054 "LAS VIRGENES UNIFIED SCHOOL"    
 80070 "LAS VIRGENES UNIFIED SCHOOL"    
 80074 "LAS VIRGENES UNIFIED SCHOOL"    
 80075 "LAS VIRGENES UNIFIED SCHOOL"    
 90001 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90002 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90003 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90004 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90005 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90006 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90007 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90008 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90009 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90010 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90012 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90014 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90015 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90016 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90017 "ACTON-AGUA DULCE UNIF SCHOOL"  
 90018 "ACTON-AGUA DULCE UNIF SCHOOL"  
150001 "ALHAMBRA USD"                  
150001 "ALHAMBRA USD-1ST DISTRICT"      
150002 "ALHAMBRA USD"                  
150002 "ALHAMBRA USD-1ST DISTRICT"      
150003 "ALHAMBRA USD"                  
150003 "ALHAMBRA USD-1ST DISTRICT"      
150004 "ALHAMBRA USD-1ST DISTRICT"      
150004 "ALHAMBRA USD"                  
150005 "ALHAMBRA USD"                  
150005 "SAN GABRIEL UNIFIED SCHOOL"    
150005 "ALHAMBRA USD-2ND DISTRICT"      
150006 "ALHAMBRA USD-3RD DISTRICT"      
150006 "ALHAMBRA USD"                  
150007 "ALHAMBRA USD-1ST DISTRICT"      
150007 "ALHAMBRA USD"                  
150008 "ALHAMBRA USD-1ST DISTRICT"      
150008 "ALHAMBRA USD"                  
150009 "ALHAMBRA USD-1ST DISTRICT"      
150009 "ALHAMBRA USD"                  
150010 "ALHAMBRA USD-1ST DISTRICT"      
150010 "ALHAMBRA USD"                  
150010 "ALHAMBRA USD-2ND DISTRICT"      
150011 "ALHAMBRA USD"                  
150011 "ALHAMBRA USD-3RD DISTRICT"      
end
I have been trying to do this in a rather tedious way by running the following code for each district:

PHP Code:
inlist2 precinctname(Acton_Laguna_Dulcevalues(50028,50059,50001,50062,50024,50052,50062,50061,50001,50059,50004,50003,1770008,90004,1770008,50001,50005,90009,1770015,90003,50004,50004,6850010,50002,50028,50001,50005,50028,50014,50005,90004,50004,50022,90015,90003,50001,50001,50059,50016,90016,50063,50022,50016,50059,90003,50025,50062,50016,90003,90008,50004,50003,50004,6850010,90003,50025,50056,90018,50021,1770015,50022,50063,90003,90004,6850024,50001,1770008,50061,50004,50059,50001,90010,1770008,50059,50051,50060,50005,50001,90007,50003,90018,50027,50061,50059,90004,1770015,90003,5000014,50003,50003,50005,50016,50001,90005,90003,90004,90001,50024,90003,50016,90003,50001,5000014,50004,90003,90003,50026,50063,50025,90006,50059,50059,50063,50021,50002,50023,90016,90017,50002,50022,50021,5000014,90003,50003,50023,90002,50016,50001,50004,5000014,50059,50001,90003,50061,90010,5000014,50014,50005,90002,50059,6850031,50061,90014,6850032,90003,50063,50061,90002,90012,50014,50059,50016,50020,50002
While this works for districts that contain fewer precincts like the Acton-Laguna Dulce it does not work for bigger districts like Los Angeles since inlist2 stops reading the precinct numbers after a certain point, citing a syntax error and resulting in a lot of missing values. I have also tried merging the data set above with the main data set that I want to create the districts in but this has not worked either since precinct was not able to match the observations of the two datasets uniquely, resulting in an error. I am also adding a data example of the main dataset for your reference.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(voter_id precinct)
  321 3450020
 1006 9000202
 1327 5150015
 1541 1940011
 1648 1000033
 1755 6220133
 1862 9000801
 2119 9000001
 2226  950003
 2547 5400047
 3339 7800006
 3981 9001312
 4024 5500109
 4131 1040020
 4559 3430010
 5565 1850040
 6143  750011
 6571 1770306
 6999  750043
 7149 6250029
 7898 1960005
 9161  950160
 9268 6250052
 9482 7750017
 9703 4800046
10112 6100015
10433 4050004
11118 2550073
11332 1500018
11439  750014
11974   50005
12231   50005
12445 5400048
12552 5400048
12980 9002416
13237 9005821
13344 9002160
13451 9000214
13879  950013
14457 9002248
14885   50016
14992 9000151
15891 1850033
15998 3330010
16255 2000012
16362 2040015
16897 6000009
17047  950161
17154 2700013
17261 4800023
17582 6000019
17689   50003
17796   50003
18267 7700154
18374 6700025
18702 2600108
18809 3550020
19922 9003988
20224 3650005
20759 9001563
21230   50016
21337   50016
21551 7700167
21765 3850376
22129 4800046
22343 3850246
22771 1070003
22878   50004
22985   50004
23242   70008
23456 3400014
23777 4800046
23991 1770001
24034 1040042
24462 9002107
24569 9000288
24676 1300021
24890 6220125
25254 9000549
25896 9001799
26153 3850366
26367 9000963
26795 3850212
26802  600007
26909   80052
27052 6600005
27915 6820009
28058 1030026
28379 3850104
28486   50062
28921 1040251
29492 5230041
29713 1450018
30015 1030049
30550 9002502
31128 9001257
32134 3300040
32669 9000357
32883 3450010
32990 5150004
end

Also, the ideal outcome would be for regions like Alhambra USD, Alhambra USD 2ND DISTRICT, ALHAMBRA USD-3RD DISTRICT and so on to be coded as one district (i.e. just Alhambra) but this is largely a secondary concern.

Does anyone have any thoughts on how to accomplish this?

How to use Outreg2 with heckman twostep model

My heckman twostep equation is as follows.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
heckman IV(independent variable) X1 X2 X1##X2 controls, select(X1 controls IV(instrumental variable)) twostep
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
After having the result, i want to export the result into Word. In this case, I dont know what to report using Outreg2 command.

thanks

Displaying Arabic text in Stata 15

Hello experts, here is a problem:
When I import data from excel or SPSS in Stata 15, I see that all Arabic text in the files (in the variable labels or value labels) get distorted. It actually crumbles to language's individual alphabets from its complete words format. An example is attached.

Any lead or guidance in this regard will solve a big obstacle and will be highly helpful.
Nadeem

nested loop for cycling over observations by group

Hi everyone,

I have a dataset with 14 million observations and 22 variables. The variables of interest for this question are as follows:
presonid: unique identifier for a person (not unique in the dataset, but uniquely identifies a person)
konisert: (0/1) a treatment has been carried out true or false.
screeningprove: (0/1): a sample can be classified as either a screening sample or not (in which case it is a follow-up sample), true or false
proveDato: sample date.

Each observation is a sample and these can be grouped by a unique person id. Each person has one to many samples(=observations), so the number of observations per person vary. I would now like to set the variable "screeningprove" to either 0 or 1 based on the occurrence of another variable being set to true "konisert ==1" in a time window of 10 years before the sample in question. This is to be done by personid.

I have tried the following:

*generate a sampleid within personid
bysort personid (proveDato): egen provenr = seq()

*generate the maximum number of samples per person
egen maxprovenr = max(provenr), by(personid)

*create an inner loop; for testing purposes keep just one personid
keep if personid == 100000493
local maxprovenr = maxprovenr[_n]
forval f = 1/ ‘maxprovenr’ {
replace screeningprove = 0 if konisert[_n-‘f’] == 1 & proveDato -proveDato[_n-‘f’] <3650
}
This seems to behave as expected.
I thought i could now nest this loop within another loop that would carry out this inner lopp for each personid. But this is where I can't wrap my head around how to do it. Is this possibly not at all the right approach to this problem?

cheers, Linn

Counting words but dependence on 2 Variables

Hello,

I am new here and trying to open the topic the right way. I hope I implemented the code correctly and my lack of english is not a problem.

My Stata-Version: 15.1

My problem:

So I have groups (for example 2) within each group are 4 members. Those members chatted in 10 different scenarios (chat1, chat2, chat3...chat10). What I want in the end is the wordcount like this list for every group and every chat:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float Gruppe byte Number_in_Group float totalwords
1 1  8
1 2 11
1 3  1
1 4 16
end
So as you can see I did this for Group 1 Chat 1. But I need this for:
Group 1 Chat 1
Group 1 Chat 2
Group 2 Chat 1
Group 2 Chat 2....

So I tried this with some loops but didn't get it to work. What I first did:

foreach var of varlist chat1-chat10 {
gen words`var' = wordcount(`var')
}

So I get the wordcount of the chats. But know I need to seperate for every group and chat and delete the "missing observations",

Maybe you can help me with that?

Thank you very much

Maximilian


This is a example of my dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte Number_in_Group str208 chat1 str113 chat2 float Gruppe
1 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
3 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
2 ""                                 ""                                                                          1
2 ""                                 ""                                                                          1
2 "1&9?"                             ""                                                                          1
1 "1 und 9?"                         ""                                                                          1
4 "hab ich auch"                     ""                                                                          1
2 "gut"                              ""                                                                          1
3 "jo"                               ""                                                                          1
4 "dann 6"                           ""                                                                          1
4 "und 5"                            ""                                                                          1
2 "ja"                               ""                                                                          1
1 "8 z�hlt nicht, weil Hund?"        ""                                                                          1
2 "denke ich auch"                   ""                                                                          1
4 "geht doch um humanoid doer nicht" ""                                                                          1
4 "hund != humanoid"                 ""                                                                          1
2 "n�chste Seiten 6 und 7?"          ""                                                                          1
2 ""                                 "5?"                                                                        1
3 ""                                 "5"                                                                         1
1 ""                                 "Ist der Panda (2) einer?"                                                  1
2 ""                                 "1 und 7? 7 sieht aber aus wie ein Spielzeug"                               1
4 ""                                 "Spielzeug hei�t ja nicht, dass es keinroboter sein kann"                   1
2 ""                                 "ja ist wahr"                                                               1
2 ""                                 "dann 5�"                                                                   1
4 ""                                 "viel der gezeigten Roboter k�nne auch nicht viel mehr als kommunizieren"   1
1 ""                                 "Ich glaube wir sollten mehr alleine tippen, weil die Zeit recht knapp ist" 1
3 ""                                 "denke ich auch"                                                            1
1 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
2 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
3 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
3 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
3 ""                                 ""                                                                          1
2 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
2 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
1 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 ""                                 ""                                                                          1
4 "Vorschl�ge?"                      ""                                                                          2
1 ":)"                               ""                                                                          2
1 "1 und 9, der?"                    ""                                                                          2
1 "Oder"                             ""                                                                          2
3 "1 und 9 w�rde ich auch sagen"     ""                                                                          2
2 "Ja"                               ""                                                                          2
4 "dito"                             ""                                                                          2
2 "8?"                               ""                                                                          2
1 "Oh, man muss ja weiter machen"    ""                                                                          2
4 "ist ja tierisch?!"                ""                                                                          2
2 "Stimmt"                           ""                                                                          2
1 "Ich w�rd sagen nein"              ""                                                                          2
2 "dann nur 6"                       ""                                                                          2
4 "1,,5,6 und 7?"                    ""                                                                          2
4 "vll 2?"                           ""                                                                          2
1 ""                                 ""                                                                          2
4 ""                                 ""                                                                          2
4 ""                                 ""                                                                          2
2 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
1 ""                                 ""                                                                          3
2 ""                                 ""                                                                          3
2 "geht los?"                        ""                                                                          3
1 "Yes"                              ""                                                                          3
1 "Hab nur Nr. 1 und 9"              ""                                                                          3
2 "genauso"                          ""                                                                          3
3 "1,9 4??"                          ""                                                                          3
2 "w�rde die Drohne nicht nehmen."   ""                                                                          3
end

Loop to summarize multiple variables in one

Hello all,

I am trying to create a variable that combines the school district identifiers listed below (I actually have to combine over 50 of these but I only included a couple for the sake of clarity). If an individual lives in the ABC school district, for instance, ABC would be equal to 1 and every other distirct variable would be missing. The labor-intensive way I would envision doing this looks like this:


PHP Code:
gen school_district_str "ABC" if ABC== 1
replace school_district_str 
"Acton-Laguna Dulce" if Acton_Laguna_Dulce== 1
replace school_district_str 
"Antelope" if Antelope== 1
replace school_district_str 
"Alhambra" if Alhambra== 1
replace school_district_str 
"Arcadia" if Arcadia== 1
replace school_district_str 
"Baldwin" if Baldwin == 
However, I am sure there must be an easier way to complete this task without writing 50+ lines of codes. I would appreciate any insights on this.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ABC Acton_Laguna_Dulce Antelope Alhambra Arcadia Baldwin)
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . 1 . .
. . 1 . . .
. . 1 . . .
. . . . . .
. . . . . .
. . . . . .
. . 1 . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . 1
1 . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
1 . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . 1 . .
. . 1 . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . 1 .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . 1 . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . 1 . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . 1 .
. . . . . .
. . . . . .
. . . . . .
end

logistic regression: outcome does not vary

My research topic is about child mortality and risk factors, including education, age, child with full vaccination or not and so on.
so my outcome is child mortality, where 0 means alive and 1 means death.
All risk factors (variables) are dummy variables.
My problem is that i tried to do logistic regression as below:
logit child_mortality ib1.gender ib1.wealth ib1.ch_fullvaccine, or
but the results showed outcome does not vary.
To figure out, i tried to do single regression first:
logit child_mortality ib1.gender,or
All results are fine except the variable ch_fullvaccine. the results indicated outcome does not vary.
May i know how to deal with it?

"Starting values invalid" message when using nlsur commands

Dear Stata users,

I am trying nlsur command using function evaluator program to estimate coefficients of QUAIDS model in Stata 14.
The thing is that when I try nlsur command, the error message occurs: could not evaluate equation 1
starting values invalid or some RHS variables have missing values
.

I sure that there is no missing variables.
I hereby attach code and function evaluator program.
I am afraid the codes are too long to go over, but I could not understand where the problem comes from.

Hope someone can tell me how to deal with "starting values invalid" message when using nlsur command in general.

Many thanks!


Code:
 

{/* function evaluator program */

    do "$dofiles\0_funcev_QUAIDS.do"
}



{/* estimation */

    use "$datain\data_korea_prices.dta", clear
    tab state, gen(r)
    
    timer on 1
    
    {    /* 1st step: Probit model */
 
    forv i=1/$neqnm2{          
                gen d`i'=0 if w`i'==0
                replace d`i'=1 if w`i'>0
                tab d`i'
 
                probit d`i' p1-p10 lnm z1 z2 i.z3 z4 z5, iter(10)
                
                mat PC`i' = (_b[p1], _b[p2], _b[p3], _b[p4], _b[p5], _b[p6], _b[p7], _b[p8])
                
                predict lipr`i', xb
                gen pdf`i'=normalden(lipr`i')
                gen cdf`i'=normal(lipr`i')
                label var lipr`i' "linear prediction for d`i'"
                label var pdf`i' "univariate standard normal probability density function for d`i'"
                label var cdf`i' "cumulative distribution function for d`i'"
                drop lipr`i' d`i'
                sum pdf`i' cdf`i'
    }

    mat PC = (PC1 \ PC2 \ PC3 \PC4\ PC5\ PC6\ PC7\ PC8)
    mat list PC
    
    * put matrix to excel
    putexcel set "$results\probit", sheet("pcoeff", replace) modify
    
    #delimit;
    putexcel A1 = ("Probit price coeffcieints  ")
    
             B2 = matrix(PC)
    ;
    #delimit cr    
    
    
    gen pdf$neqn = 0
    gen cdf$neqn = 1

    gen pdf$neqnm1 = 0
    gen cdf$neqnm1 = 1
    
    sum cdf* pdf*
    
    forv i=1/$neqn {
        drop if cdf`i' == .
        drop if pdf`i' == .
    }
    
    }
        
    {/* 2nd step: system estimation */
    
    nlsur quaidscensored @ $shares $prices $logexp $cdfs $pdfs r1-r16, fgnls nequations($neqn) ///
        param(a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 ///
        g11 g12 g13 g14 g15 g16 g17 g18 g19 g110 g22 g23 g24 g25 g26 g27 g28 g29 g210 ///
        g33 g34 g35 g36 g37 g38 g39 g310 g44 g45 g46 g47 g48 g49 g410 ///
        g55 g56 g57 g58 g59 g510 g66 g67 g68 g69 g610 g77 g78 g79 g710 ///
        g88 g89 g810 g99 g910 g1010  ///
        b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 l1 l2 l3 l4 l5 l6 l7 l8 l9 l10 ///
        d1a d2a d3a d4a d5a d6a d7a d8a d9a d10a ///
        k1d1 k2d1 k3d1 k4d1 k5d1 k6d1 k7d1 k8d1 k9d1 k10d1 ///
        k1d2 k2d2 k3d2 k4d2 k5d2 k6d2 k7d2 k8d2 k9d2 k10d2 ///
        k1d3 k2d3 k3d3 k4d3 k5d3 k6d3 k7d3 k8d3 k9d3 k10d3 ///
        k1d4 k2d4 k3d4 k4d4 k5d4 k6d4 k7d4 k8d4 k9d4 k10d4 ///
        k1d5 k2d5 k3d5 k4d5 k5d5 k6d5 k7d5 k8d5 k9d5 k10d5 ///
        k1d6 k2d6 k3d6 k4d6 k5d6 k6d6 k7d6 k8d6 k9d6 k10d6 ///
        k1d7 k2d7 k3d7 k4d7 k5d7 k6d7 k7d7 k8d7 k9d7 k10d7 ///
        k1d8 k2d8 k3d8 k4d8 k5d8 k6d8 k7d8 k8d8 k9d8 k10d8 ///
        k1d9 k2d9 k3d9 k4d9 k5d9 k6d9 k7d9 k8d9 k9d9 k10d9 ///
        k1d10 k2d10 k3d10 k4d10 k5d10 k6d10 k7d10 k8d10 k9d10 k10d10 ///
        k1d11 k2d11 k3d11 k4d11 k5d11 k6d11 k7d11 k8d11 k9d11 k10d11 ///
        k1d12 k2d12 k3d12 k4d12 k5d12 k6d12 k7d12 k8d12 k9d12 k10d12 ///
        k1d13 k2d13 k3d13 k4d13 k5d13 k6d13 k7d13 k8d13 k9d13 k10d13 ///
        k1d14 k2d14 k3d14 k4d14 k5d14 k6d14 k7d14 k8d14 k9d14 k10d14 ///
        k1d15 k2d15 k3d15 k4d15 k5d15 k6d15 k7d15 k8d15 k9d15 k10d15 ///
        k1d16 k2d16 k3d16 k4d16 k5d16 k6d16 k7d16 k8d16 k9d16 k10d16)
        
    }

    timer off 1
    timer list 1
    estimates save "$datain\est_regfe", replace
}

    forv i = 1/$neqn {
        predict res`i', equation(#`i') residuals
        predict w`i'_0, equation(#`i') yhat
    }

And the function evaluator program

Code:
program nlsurquaidscensored
    
    version 14

    syntax varlist (min=57  max=57) if, at(name)
    tokenize `varlist'
    args w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ///
         lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnp7 lnp8 lnp9 lnp10 ///
         lnm cdf1 cdf2 cdf3 cdf4 cdf5 cdf6 cdf7 cdf8 cdf9 cdf10 ///
         pdf1 pdf2 pdf3 pdf4 pdf5 pdf6 pdf7 pdf8 pdf9 pdf10 ///
         r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16
    
    /* We have 10 goods, imposing symmetry and homogeneity constraints there are
                
    parameters than can be estimated in the censored QUAIDS model. Below, we extract those parameters from the `at' vector and
    impose constraints as we go along */
    
    {/* a, constants */
    tempname a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
    scalar `a1' = `at'[1,1]
    scalar `a2' = `at'[1,2]
    scalar `a3' = `at'[1,3]
    scalar `a4' = `at'[1,4]    
    scalar `a5' = `at'[1,5]
    scalar `a6' = `at'[1,6]
    scalar `a7' = `at'[1,7]
    scalar `a8' = `at'[1,8]
    scalar `a9' = `at'[1,9]
    scalar `a10' = `at'[1,10]
    
    }
    {/* g, price coefficients */
    tempname g11 g12 g13 g14 g15 g16 g17 g18 g19 g110
    tempname g21 g22 g23 g24 g25 g26 g27 g28 g29 g210
    tempname g31 g32 g33 g34 g35 g36 g37 g38 g39 g310
    tempname g41 g42 g43 g44 g45 g46 g47 g48 g49 g410
    tempname g51 g52 g53 g54 g55 g56 g57 g58 g59 g510
    tempname g61 g62 g63 g64 g65 g66 g67 g68 g69 g610
    tempname g71 g72 g73 g74 g75 g76 g77 g78 g79 g710
    tempname g81 g82 g83 g84 g85 g86 g87 g88 g89 g810
    tempname g91 g92 g93 g94 g95 g96 g97 g98 g99 g910
    tempname g101 g102 g103 g104 g105 g106 g107 g108 g109 g1010
    
    scalar `g11' = `at'[1,11]
    scalar `g12' = `at'[1,12]
    scalar `g13' = `at'[1,13]
    scalar `g14' = `at'[1,14]
    scalar `g15' = `at'[1,15]
    scalar `g16' = `at'[1,16]
    scalar `g17' = `at'[1,17]
    scalar `g18' = `at'[1,18]
    scalar `g19' = `at'[1,19]
    scalar `g110' = `at'[1,20]
    
    scalar `g21' = `g12'
    scalar `g22' = `at'[1,21]
    scalar `g23' = `at'[1,22]
    scalar `g24' = `at'[1,23]
    scalar `g25' = `at'[1,24]
    scalar `g26' = `at'[1,25]
    scalar `g27' = `at'[1,26]
    scalar `g28' = `at'[1,27]
    scalar `g29' = `at'[1,28]
    scalar `g210' = `at'[1,29]

    scalar `g31' = `g13'
    scalar `g32' = `g23'
    scalar `g33' = `at'[1,30]
    scalar `g34' = `at'[1,31]
    scalar `g35' = `at'[1,32]
    scalar `g36' = `at'[1,33]
    scalar `g37' = `at'[1,34]
    scalar `g38' = `at'[1,35]
    scalar `g39' = `at'[1,36]
    scalar `g310' = `at'[1,37]
    
    scalar `g41' = `g14'
    scalar `g42' = `g24'
    scalar `g43' = `g34'
    scalar `g44' = `at'[1,38]
    scalar `g45' = `at'[1,39]
    scalar `g46' = `at'[1,40]
    scalar `g47' = `at'[1,41]
    scalar `g48' = `at'[1,42]
    scalar `g49' = `at'[1,43]
    scalar `g410' = `at'[1,44]
    
    scalar `g51' = `g15'
    scalar `g52' = `g25'
    scalar `g53' = `g35'
    scalar `g54' = `g45'
    scalar `g55' = `at'[1,45]
    scalar `g56' = `at'[1,46]
    scalar `g57' = `at'[1,47]
    scalar `g58' = `at'[1,48]
    scalar `g59' = `at'[1,49]
    scalar `g510' = `at'[1,50]
    
    scalar `g61' = `g16'
    scalar `g62' = `g26'
    scalar `g63' = `g36'
    scalar `g64' = `g46'
    scalar `g65' = `g56'
    scalar `g66' = `at'[1,51]
    scalar `g67' = `at'[1,52]
    scalar `g68' = `at'[1,53]
    scalar `g69' = `at'[1,54]
    scalar `g610' = `at'[1,55]
    
    scalar `g71' = `g17'
    scalar `g72' = `g27'
    scalar `g73' = `g37'
    scalar `g74' = `g47'
    scalar `g75' = `g57'
    scalar `g76' = `g67'
    scalar `g77' = `at'[1,56]
    scalar `g78' = `at'[1,57]
    scalar `g79' = `at'[1,58]
    scalar `g710' = `at'[1,59]
    
    scalar `g81' = `g18'
    scalar `g82' = `g28'
    scalar `g83' = `g38'
    scalar `g84' = `g48'
    scalar `g85' = `g58'
    scalar `g86' = `g68'
    scalar `g87' = `g78'
    scalar `g88' = `at'[1,60]
    scalar `g89' = `at'[1,61]
    scalar `g810' = `at'[1,62]
    
    scalar `g91' = `g19'
    scalar `g92' = `g29'
    scalar `g93' = `g39'
    scalar `g94' = `g49'
    scalar `g95' = `g59'
    scalar `g96' = `g69'
    scalar `g97' = `g79'
    scalar `g98' = `g89'
    scalar `g99' = `at'[1,63]
    scalar `g910' = `at'[1,64]
    
    scalar `g101' = `g110'
    scalar `g102' = `g210'
    scalar `g103' = `g310'
    scalar `g104' = `g410'
    scalar `g105' = `g510'
    scalar `g106' = `g610'
    scalar `g107' = `g710'
    scalar `g108' = `g810'
    scalar `g109' = `g910'
    scalar `g1010' = `at'[1,65]
    
    }
    
    {/* b, income coefficients */
    tempname b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
    scalar `b1' = `at'[1,66]
    scalar `b2' = `at'[1,67]
    scalar `b3' = `at'[1,68]
    scalar `b4' = `at'[1,69]
    scalar `b5' = `at'[1,70]
    scalar `b6' = `at'[1,71]
    scalar `b7' = `at'[1,72]
    scalar `b8' = `at'[1,73]
    scalar `b9' = `at'[1,74]
    scalar `b10' = `at'[1,75]
    
    }
    
    {/* l, squared income coefficients */
    tempname l1 l2 l3 l4 l5 l6 l7 l8 l9 l10
    scalar `l1' = `at'[1,76]
    scalar `l2' = `at'[1,77]
    scalar `l3' = `at'[1,78]                                            
    scalar `l4' = `at'[1,79]
    scalar `l5' = `at'[1,80]
    scalar `l6' = `at'[1,81]
    scalar `l7' = `at'[1,82]                                            
    scalar `l8' = `at'[1,83]
    scalar `l9' = `at'[1,84]
    scalar `l10' = `at'[1,85]
    }

    {/* d`i'a, pdf coefficients */
    tempname d1a d2a d3a d4a d5a d6a d7a d8a d9a d10a
    scalar `d1a' = `at'[1,86]
    scalar `d2a' = `at'[1,84]
    scalar `d3a' = `at'[1,88]
    scalar `d4a' = `at'[1,89]
    scalar `d5a' = `at'[1,90]
    scalar `d6a' = `at'[1,91]
    scalar `d7a' = `at'[1,92]
    scalar `d8a' = `at'[1,93]
    scalar `d9a' = `at'[1,94]
    scalar `d10a' = `at'[1,95]
    }
    
    {/* k, region dummies */
    * region 1
    tempname k1d1 k2d1 k3d1 k4d1 k5d1 k6d1 k7d1 k8d1 k9d1 k10d1  
    scalar `k1d1' = `at'[1,96]
    scalar `k2d1' = `at'[1,97]
    scalar `k3d1' = `at'[1,98]
    scalar `k4d1' = `at'[1,99]
    scalar `k5d1' = `at'[1,100]
    scalar `k6d1' = `at'[1,101]
    scalar `k7d1' = `at'[1,102]
    scalar `k8d1' = `at'[1,103]
    scalar `k9d1' = `at'[1,104]
    scalar `k10d1' = `at'[1,105]
    
    * region 2
    tempname k1d2 k2d2 k3d2 k4d2 k5d2 k6d2 k7d2 k8d2 k9d2 k10d2  
    scalar `k1d2' = `at'[1,106]
    scalar `k2d2' = `at'[1,107]
    scalar `k3d2' = `at'[1,108]
    scalar `k4d2' = `at'[1,109]
    scalar `k5d2' = `at'[1,110]
    scalar `k6d2' = `at'[1,111]
    scalar `k7d2' = `at'[1,112]
    scalar `k8d2' = `at'[1,113]
    scalar `k9d2' = `at'[1,114]
    scalar `k10d2' = `at'[1,115]
    
    * region 3
    tempname k1d3 k2d3 k3d3 k4d3 k5d3 k6d3 k7d3 k8d3 k9d3 k10d3  
    scalar `k1d3' = `at'[1,116]
    scalar `k2d3' = `at'[1,117]
    scalar `k3d3' = `at'[1,118]
    scalar `k4d3' = `at'[1,119]
    scalar `k5d3' = `at'[1,120]
    scalar `k6d3' = `at'[1,121]
    scalar `k7d3' = `at'[1,122]
    scalar `k8d3' = `at'[1,123]
    scalar `k9d3' = `at'[1,124]
    scalar `k10d3' = `at'[1,125]
    
    * region 4
    tempname k1d4 k2d4 k3d4 k4d4 k5d4 k6d4 k7d4 k8d4 k9d4 k10d4  
    scalar `k1d4' = `at'[1,126]
    scalar `k2d4' = `at'[1,127]
    scalar `k3d4' = `at'[1,128]
    scalar `k4d4' = `at'[1,129]
    scalar `k5d4' = `at'[1,130]
    scalar `k6d4' = `at'[1,131]
    scalar `k7d4' = `at'[1,132]
    scalar `k8d4' = `at'[1,133]
    scalar `k9d4' = `at'[1,134]
    scalar `k10d4' = `at'[1,135]
    
    * region 5
    tempname k1d5 k2d5 k3d5 k4d5 k5d5 k6d5 k7d5 k8d5 k9d5 k10d5  
    scalar `k1d5' = `at'[1,136]
    scalar `k2d5' = `at'[1,137]
    scalar `k3d5' = `at'[1,138]
    scalar `k4d5' = `at'[1,139]
    scalar `k5d5' = `at'[1,140]
    scalar `k6d5' = `at'[1,141]
    scalar `k7d5' = `at'[1,142]
    scalar `k8d5' = `at'[1,143]
    scalar `k9d5' = `at'[1,144]
    scalar `k10d5' = `at'[1,145]
    
    * region 6
    tempname k1d6 k2d6 k3d6 k4d6 k5d6 k6d6 k7d6 k8d6 k9d6 k10d6  
    scalar `k1d6' = `at'[1,146]
    scalar `k2d6' = `at'[1,147]
    scalar `k3d6' = `at'[1,148]
    scalar `k4d6' = `at'[1,149]
    scalar `k5d6' = `at'[1,150]
    scalar `k6d6' = `at'[1,151]
    scalar `k7d6' = `at'[1,152]
    scalar `k8d6' = `at'[1,153]
    scalar `k9d6' = `at'[1,154]
    scalar `k10d6' = `at'[1,155]
    
    * region 7
    tempname k1d7 k2d7 k3d7 k4d7 k5d7 k6d7 k7d7 k8d7 k9d7 k10d7  
    scalar `k1d7' = `at'[1,156]
    scalar `k2d7' = `at'[1,157]
    scalar `k3d7' = `at'[1,158]
    scalar `k4d7' = `at'[1,159]
    scalar `k5d7' = `at'[1,160]
    scalar `k6d7' = `at'[1,161]
    scalar `k7d7' = `at'[1,162]
    scalar `k8d7' = `at'[1,163]
    scalar `k9d7' = `at'[1,164]
    scalar `k10d7' = `at'[1,165]
    
    * region 8
    tempname k1d8 k2d8 k3d8 k4d8 k5d8 k6d8 k7d8 k8d8 k9d8 k10d8  
    scalar `k1d8' = `at'[1,166]
    scalar `k2d8' = `at'[1,167]
    scalar `k3d8' = `at'[1,168]
    scalar `k4d8' = `at'[1,169]
    scalar `k5d8' = `at'[1,170]
    scalar `k6d8' = `at'[1,171]
    scalar `k7d8' = `at'[1,172]
    scalar `k8d8' = `at'[1,173]
    scalar `k9d8' = `at'[1,174]
    scalar `k10d8' = `at'[1,175]
    
    * region 9
    tempname k1d9 k2d9 k3d9 k4d9 k5d9 k6d9 k7d9 k8d9 k9d9 k10d9  
    scalar `k1d9' = `at'[1,176]
    scalar `k2d9' = `at'[1,177]
    scalar `k3d9' = `at'[1,178]
    scalar `k4d9' = `at'[1,179]
    scalar `k5d9' = `at'[1,180]
    scalar `k6d9' = `at'[1,181]
    scalar `k7d9' = `at'[1,182]
    scalar `k8d9' = `at'[1,183]
    scalar `k9d9' = `at'[1,184]
    scalar `k10d9' = `at'[1,185]
    
    * region 10
    tempname k1d10 k2d10 k3d10 k4d10 k5d10 k6d10 k7d10 k8d10 k9d10 k10d10  
    scalar `k1d10' = `at'[1,186]
    scalar `k2d10' = `at'[1,187]
    scalar `k3d10' = `at'[1,188]
    scalar `k4d10' = `at'[1,189]
    scalar `k5d10' = `at'[1,190]
    scalar `k6d10' = `at'[1,191]
    scalar `k7d10' = `at'[1,192]
    scalar `k8d10' = `at'[1,193]
    scalar `k9d10' = `at'[1,194]
    scalar `k10d10' = `at'[1,195]
    
    * region 11
    tempname k1d11 k2d11 k3d11 k4d11 k5d11 k6d11 k7d11 k8d11 k9d11 k10d11  
    scalar `k1d11' = `at'[1,196]
    scalar `k2d11' = `at'[1,197]
    scalar `k3d11' = `at'[1,198]
    scalar `k4d11' = `at'[1,199]
    scalar `k5d11' = `at'[1,200]
    scalar `k6d11' = `at'[1,201]
    scalar `k7d11' = `at'[1,202]
    scalar `k8d11' = `at'[1,203]
    scalar `k9d11' = `at'[1,204]
    scalar `k10d11' = `at'[1,205]
    
    * region 12
    tempname k1d12 k2d12 k3d12 k4d12 k5d12 k6d12 k7d12 k8d12 k9d12 k10d12  
    scalar `k1d12' = `at'[1,206]
    scalar `k2d12' = `at'[1,207]
    scalar `k3d12' = `at'[1,208]
    scalar `k4d12' = `at'[1,209]
    scalar `k5d12' = `at'[1,210]
    scalar `k6d12' = `at'[1,211]
    scalar `k7d12' = `at'[1,212]
    scalar `k8d12' = `at'[1,213]
    scalar `k9d12' = `at'[1,214]
    scalar `k10d12' = `at'[1,215]
    
    * region 13
    tempname k1d13 k2d13 k3d13 k4d13 k5d13 k6d13 k7d13 k8d13 k9d13 k10d13  
    scalar `k1d13' = `at'[1,216]
    scalar `k2d13' = `at'[1,217]
    scalar `k3d13' = `at'[1,218]
    scalar `k4d13' = `at'[1,219]
    scalar `k5d13' = `at'[1,220]
    scalar `k6d13' = `at'[1,221]
    scalar `k7d13' = `at'[1,222]
    scalar `k8d13' = `at'[1,223]
    scalar `k9d13' = `at'[1,224]
    scalar `k10d13' = `at'[1,225]
    
    * region 14
    tempname k1d14 k2d14 k3d14 k4d14 k5d14 k6d14 k7d14 k8d14 k9d14 k10d14  
    scalar `k1d14' = `at'[1,226]
    scalar `k2d14' = `at'[1,227]
    scalar `k3d14' = `at'[1,228]
    scalar `k4d14' = `at'[1,229]
    scalar `k5d14' = `at'[1,230]
    scalar `k6d14' = `at'[1,231]
    scalar `k7d14' = `at'[1,232]
    scalar `k8d14' = `at'[1,233]
    scalar `k9d14' = `at'[1,234]
    scalar `k10d14' = `at'[1,235]
    
    * region 15
    tempname k1d15 k2d15 k3d15 k4d15 k5d15 k6d15 k7d15 k8d15 k9d15 k10d15  
    scalar `k1d15' = `at'[1,236]
    scalar `k2d15' = `at'[1,237]
    scalar `k3d15' = `at'[1,238]
    scalar `k4d15' = `at'[1,239]
    scalar `k5d15' = `at'[1,240]
    scalar `k6d15' = `at'[1,241]
    scalar `k7d15' = `at'[1,242]
    scalar `k8d15' = `at'[1,243]
    scalar `k9d15' = `at'[1,244]
    scalar `k10d15' = `at'[1,245]
    
    * region 16
    tempname k1d16 k2d16 k3d16 k4d16 k5d16 k6d16 k7d16 k8d16 k9d16 k10d16  
    scalar `k1d16' = `at'[1,246]
    scalar `k2d16' = `at'[1,247]
    scalar `k3d16' = `at'[1,248]
    scalar `k4d16' = `at'[1,259]
    scalar `k5d16' = `at'[1,250]
    scalar `k6d16' = `at'[1,251]
    scalar `k7d16' = `at'[1,252]
    scalar `k8d16' = `at'[1,253]
    scalar `k9d16' = `at'[1,254]
    scalar `k10d16' = `at'[1,255]    
        
    }
    
 
    // price indices and expenditure shares    
    
    quietly {
        // First get the price index
        // I set a_0 = 5
        
        // demand shifters
        
        
        
        
        tempvar lnpindex
        gen double `lnpindex' = 5
        forvalues i = 1/$neqn {
            replace `lnpindex' = `lnpindex' + ((`a`i'' + ///
            `k`i'd1'*`r1' + `k`i'd2'*`r2' + `k`i'd3'*`r3' + `k`i'd4'*`r4' + ///
            `k`i'd5'*`r5' + `k`i'd6'*`r6' + `k`i'd7'*`r7' + `k`i'd8'*`r8' + ///
            `k`i'd9'*`r9' + `k`i'd10'*`r10' + `k`i'd11'*`r11' + `k`i'd12'*`r12' + ///
            `k`i'd13'*`r13' + `k`i'd14'*`r14' + `k`i'd15'*`r15' + `k`i'd16'*`r16')*`lnp`i'')
        }
        forvalues i = 1/$neqn {
            forvalues j = 1/$neqn {
                replace `lnpindex' = `lnpindex' + 0.5*`g`i'`j''*`lnp`i''*`lnp`j''
            }
        }
    
        
        // The b(p) term in the QUAIDS model
        tempvar bofp
        gen double `bofp' = 0
        forvalues i = 1/$neqn {
            replace `bofp' = `bofp' + `lnp`i''*`b`i''
        }
        replace `bofp' = exp(`bofp')
    
        // Finally, the expenditure shares for 6 goods system
        // do demographics and region fixed effects become part of the price index? How to deal with it?
        
        forvalues i = 1/$neqn {
            replace `w`i'' = `cdf`i''*[`a`i'' + ///
            `k`i'd1'*`r1' + `k`i'd2'*`r2' + `k`i'd3'*`r3' + `k`i'd4'*`r4' + `k`i'd5'*`r5' ///
            + `k`i'd6'*`r6' + `k`i'd7'*`r7' + `k`i'd8'*`r8' + `k`i'd9'*`r9' + `k`i'd10'*`r10' ///
            + `k`i'd11'*`r11' + `k`i'd12'*`r12' + `k`i'd13'*`r13' + `k`i'd14'*`r14' + `k`i'd15'*`r15' ///
            + `k`i'd16'*`r16' + `g`i'1'*`lnp1' + `g`i'2'*`lnp2' + `g`i'3'*`lnp3' ///
            + `g`i'4'*`lnp4' + `g`i'5'*`lnp5' + `g`i'6'*`lnp6' + `g`i'7'*`lnp7' ///
            + `g`i'8'*`lnp8' +`g`i'9'*`lnp9' + `g`i'10'*`lnp10' ///
            +`b`i''*(`lnm' - `lnpindex') + `l`i''/`bofp'*(`lnm' - `lnpindex')^2] + `d`i'a'*`pdf`i''
        }
        
    }

end

Monday, August 30, 2021

Test constraint dropped

Dear Statalist:

*First, I run the following Modified Pearson Regression with interactions

glm k6cat Cancer##caregiving i.sex_cat ib0.age_cat ib0.edu i.job i.setaiexpbyninsu_cat ib1.smoking Diabetes Eye_disease Hypertension Stroke Myocardial_infarction Asthma Gastroduodenal_dis Disease_of_liver_bowel Rheumatoid_arthritis Arthropathy Lower_back_pain Osteoporosis Fracture, fam(Poisson) link(log) nolog eform vce(robust) allbaselevels

The main variables are defined as:
- k6 cat: binary. Cancer2 (binary, 0=no cancer, 1= cancer), caregiving2 (binary, 0= not caregiving, 1= caregiving)


Generalized linear models Number of obs = 672,849
Optimization : ML Residual df = 672,823
Scale parameter = 1
Deviance = 462649.7608 (1/df) Deviance = .6876248
Pearson = 493030.1896 (1/df) Pearson = .7327784

Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]

AIC = 1.218347
Log pseudolikelihood = -409855.8804 BIC = -8566148

---------------------------------------------------------------------------------------------
| Robust
k6cat | IRR Std. Err. z P>|z| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
Cancer2 |
0 | 1 (base)
1 | 1.467965 .0191241 29.47 0.000 1.430957 1.50593
|
caregiving2 |
0 | 1 (base)
1 | 1.297198 .0212495 15.88 0.000 1.256211 1.339522
|
Cancer2#caregiving2 |
0 0 | 1 (base)
0 1 | 1 (base)
1 0 | 1 (base)
1 1 | .9320186 .0878053 -0.75 0.455 .7748775 1.121027
|


* Second, I run: test 1.Cancer#0.caregiving=0.Cancer#1.caregiving
It showed the following:

test 1.Cancer2#0.caregiving2=0.Cancer2#1.caregiving2

( 1) - [k6cat]0b.Cancer2#1o.caregiving2 + [k6cat]1o.Cancer2#0b.caregiving2 = 0
Constraint 1 dropped

chi2( 0) = .
Prob > chi2 = .


* Third, the Constraint dropped giving me empty values. Then I tried to test the constraints (only to wanting to knnow where the error is), and I found:

test 1.Cancer2 0.caregiving2 0.Cancer2 1.caregiving2

( 1) [k6cat]1.Cancer2 = 0
( 2) [k6cat]0b.caregiving2 = 0
( 3) [k6cat]0b.Cancer2 = 0
( 4) [k6cat]1.caregiving2 = 0
Constraint 2 dropped
Constraint 3 dropped

chi2( 2) = 1099.42
Prob > chi2 = 0.0000

It seems that 0.Cancer2 and 0.caregiving2 have "an error" which I can't recognize.

Please, can you help me with this. Thank you in advance

Why the number of observation decrease when I increase the sample size?

Hi all,

Today I face a strange situation that the number of observations shrinking when I expand the sample size

In particular, the numbers of observations for variable x1 and x2 in UNITEDS in my samples are

count if x1 != . & inlist(GEOGN, "UNITEDS")

count if x2 != . & inlist(GEOGN, "UNITEDS")

The result for these two variables are the same
Array


Then, I try to run the regression of x2 on x1 for this country (UNITEDS)

Code:
. reghdfe x1 x2 if  inlist(GEOGN, "UNITEDS"), a(TYPE2 INDC32#yr)
(dropped 1013 singleton observations)
note: x2 is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
(MWFE estimator converged in 14 iterations)
note: x2 omitted because of collinearity

HDFE Linear regression                            Number of obs   =     54,409
Absorbing 2 HDFE groups                           F(   0,  47843) =          .
                                                  Prob > F        =          .
                                                  R-squared       =     0.8063
                                                  Adj R-squared   =     0.7797
                                                  Within R-sq.    =     0.0000
                                                  Root MSE        =     0.3916

------------------------------------------------------------------------------
          x1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x2 |          0  (omitted)
       _cons |   1.307023   .0016788   778.54   0.000     1.303733    1.310314
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       TYPE2 |      6131           0        6131     |
   INDC32#yr |       450          15         435     |
-----------------------------------------------------+
However, when I run the regression for the bigger sample (more countries) , the number of observation decrease drastically, can I ask what is the reason behind this shrinking and what should we do in this case?

Code:
. reghdfe x1 x2 if  inlist(GEOGN, "CHINA" "UNITEDS" "INDONESIA" "RUSSIAN" "MEXICO" "JAPAN" "PHILIPPINES" "VIETNAM" "SOUTHKOREA") | inlist(GEOGN,"COLOMBIA" "CANADA" "P
> ERU" "MALAYSIA" "AUSTRALIA" "CHILE" "ECUADOR" "SINGAPORE" "NEWZEALAND"), a(TYPE2 INDC32#yr)
(dropped 194 singleton observations)
(MWFE estimator converged in 14 iterations)

HDFE Linear regression                            Number of obs   =     22,689
Absorbing 2 HDFE groups                           F(   1,  18715) =       0.07
                                                  Prob > F        =     0.7857
                                                  R-squared       =     0.7423
                                                  Adj R-squared   =     0.6876
                                                  Within R-sq.    =     0.0000
                                                  Root MSE        =     0.2734

------------------------------------------------------------------------------
          x1| Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x2 |   .0160276    .058948     0.27   0.786    -.0995158    .1315709
       _cons |   .7591069   .0458817    16.54   0.000     .6691746    .8490393
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       TYPE2 |      3614           0        3614     |
   INDC32#yr |       374          15         359     |
-----------------------------------------------------+
Update:
As suggested by Ken Chui, I apply another way to deal with a subsample of countries (https://www.statalist.org/forums/for...st2-in-my-code)

And it turns out that the number of observation for the expanded sample are much bigger

Code:
gen include = 0
foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES ///
                VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA ///
                CHILE ECUADOR SINGAPORE NEWZEALAND{
    replace include = 1 if GEOGN == "`ctry'"   
}
reghdfe x1 x2 if include == 1, a(TYPE2 INDC32#yr)
Code:
. reghdfe x1 x2 if include == 1, a(TYPE2 INDC32#yr)
(dropped 2165 singleton observations)
(MWFE estimator converged in 13 iterations)

HDFE Linear regression                            Number of obs   =    232,994
Absorbing 2 HDFE groups                           F(   1, 209389) =      88.97
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.8183
                                                  Adj R-squared   =     0.7978
                                                  Within R-sq.    =     0.0004
                                                  Root MSE        =     0.3176

------------------------------------------------------------------------------
          x1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x2 |   .0282004   .0029897     9.43   0.000     .0223407    .0340601
       _cons |   1.079796   .0023016   469.15   0.000     1.075285    1.084307
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       TYPE2 |     23169           0       23169     |
   INDC32#yr |       450          15         435     |
-----------------------------------------------------+

How to see the frequency of a veriables in each country in a dataset ?

By the helps from others, I see that I can list the unique countries from a sample by using\

tab GEOGN

where GEOGN is a character variable standing for countries.

Now I want to know how many observation of variables called "firm_size" in each country, how should I set up the code to achieve it?

What is wrong with inlist2 in my code?

Hi all,

Yesterday, as documented from this post (https://www.statalist.org/forums/for...p-of-countries) , I try to find a way to test my code by using inlist for firms in more than 10 countries as below and it seems not correct

reghdfe y x if inlist(GEOGN, "CHINA" "UNITEDS" "INDONESIA" "RUSSIAN" "MEXICO" "JAPAN" "PHILIPPINES" "VIETNAM" "SOUTHKOREA") | inlist(GEOGN,"COLOMBIA" "CANADA" "PERU" "MALAYSIA" "AUSTRALIA" "CHILE" "ECUADOR" "SINGAPORE" "NEWZEALAND"), a(TYPE2 INDC32#yr)

I am search and found that we also have another user-written code named inlist2 without restriction regarding the number of countries included. Therefore, I install inlist2 and run the code but STATA returns an error

reghdfe y x if inlist2(GEOGN, CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA CHILE ECUADOR SINGAPORE NEWZEALAND), a(TYPE2 INDC32#yr)

unknown function inlist2()

r(133);


(I also test with the quotation marks for each country).

Could you please help me to identify what is the problem?

Thanks

Dates from Excel/CSV not correctly displaying in Stata

I imported several csv files with "birthdate" as one of the variables into Stata and then I appended the files to produce one dataset. However, dates from some of the constituent datasets did not show up at all.

To solve this problem, I used the command "tostring birthdate" before appending The hitherto missing dates now show up but as figures, some positive, others negative.

I have tried various manipulations, but these dates do not convert to the actual birthdates from the csv files.

After formatting as dates, they mostly default to 1960 as the birth year. But that is not the birthday in the original file.

Please kindly help me solve this problem.

division of two-column using if function

Hello,
I have this dataset, and I want to divide CLMS18/CLMS17 if NPI17 is the same as NPI18 and the gnrc17 is the same as gnrc18. Is there a command that can do that?
Thank you,

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long NPI17 str30 gnrc17 int CLMS17 long NPI18 str30 gnrc18 long CLMS18
1457350522 "Acetylcysteine"                  16 1356409437 "ACETYLCYSTEINE"                 20
1891993317 "Albuterol Sulfate"               23 1174904106 "ALBUTEROL SULFATE"              32
1851371405 "Amoxapine"                      104 1184003089 "ALBUTEROL SULFATE"              12
1417051921 "Amoxicillin/Potassium Clav"      44 1063674364 "ALBUTEROL SULFATE"              12
1225023872 "Atenolol"                        55 1427293901 "AMOXICILLIN/POTASSIUM CLAV"     27
1700958659 "Atenolol/Chlorthalidone"         12 1609801380 "ATENOLOL/CHLORTHALIDONE"        16
1194753186 "Atropine Sulfate"                26 1588603781 "BUSPIRONE HCL"                  94
1952310666 "Buspirone Hcl"                   15 1043237613 "BUSPIRONE HCL"                 106
1225023872 "Carbidopa/Levodopa"              26 1447540349 "BUSPIRONE HCL"                  11
1104267053 "Cefpodoxime Proxetil"            11 1972849834 "CIPROFLOXACIN HCL"              74
1417967118 "Ceftazidime In Dextrose5%water"  14 1528118510 "CIPROFLOXACIN HCL"              19
1154588531 "Chorionic Gonadotropin, Human"   40 1669727822 "CIPROFLOXACIN HCL"              22
1538187000 "Ciprofloxacin"                   11 1851485049 "CIPROFLOXACIN HCL"              14
1225023872 "Clonidine Hcl"                   19 1326391707 "CIPROFLOXACIN HCL"              16
1225023872 "Cyclosporine"                    15 1215042866 "CYCLOSPORINE"                   14
1558542209 "Cytarabine/Pf"                   29 1861462772 "DORZOLAMIDE HCL"                37
1265497366 "Disopyramide Phosphate"          11 1245348911 "DORZOLAMIDE HCL"                22
1194753186 "Dorzolamide Hcl"                 16 1326070020 "DORZOLAMIDE HCL"                11
1023006913 "Fentanyl Citrate/Pf"             41 1417937715 "FLUDROCORTISONE ACETATE"        13
1700958659 "Fludrocortisone Acetate"         12 1730184540 "FLUDROCORTISONE ACETATE"        20
1952310666 "Fluphenazine Hcl"                33 1326159385 "FLUPHENAZINE HCL"               20
1316155864 "Flurbiprofen Sodium"            115 1376734806 "FUROSEMIDE"                     62
1801093968 "Furosemide"                      11 1134668890 "FUROSEMIDE"                     11
1194753186 "Gentamicin Sulfate"              22 1922090570 "FUROSEMIDE"                     51
1356326136 "Gentamicin Sulfate/Pf"           12 1952322711 "HALOPERIDOL"                    11
1952310666 "Haloperidol"                     42 1942322110 "HALOPERIDOL"                    53
1740284074 "Haloperidol Lactate"             54 1316013006 "IPRATROPIUM BROMIDE"            15
1326224932 "Hydroxyprogesterone Caproat/Pf"  11 1386731396 "IPRATROPIUM BROMIDE"            20
1053419028 "Indomethacin"                    11 1831190982 "IPRATROPIUM BROMIDE"            32
1306948609 "Ipratropium Bromide"             19 1760486443 "LEUPROLIDE ACETATE"             12
1477538734 "Isosorbide Dinitrate"            11 1467453613 "LORAZEPAM"                     106
1518941293 "Leuprolide Acetate"              26 1467443838 "LORAZEPAM"                      88
1700958659 "Lidocaine Hcl"                   18 1922186428 "LORAZEPAM"                      13
1952310666 "Lorazepam"                      265 1255363719 "LORAZEPAM"                      11
1316029341 "Mafenide Acetate"                11 1629037775 "MORPHINE SULFATE"               12
1265587869 "Meperidine Hcl"                  21 1770514242 "MORPHINE SULFATE"               19
1275737389 "Meperidine Hcl/Pf"               76 1508839093 "MORPHINE SULFATE"               24
1003889502 "Methocarbamol"                   14 1346252525 "MORPHINE SULFATE"               18
1952310666 "Methylphenidate Hcl"             30 1710234315 "MUPIROCIN"                      57
1437266202 "Midodrine Hcl"                   16 1972603793 "MUPIROCIN"                      17
1003889502 "Morphine Sulfate"                28 1447339452 "MYCOPHENOLATE MOFETIL"          24
1295757904 "Morphine Sulfate/Pf"             22 1770677726 "NYSTATIN"                       12
1326330127 "Mupirocin"                      114 1811907108 "NYSTATIN"                       13
1093808966 "Mupirocin Calcium"               13 1174785950 "OSELTAMIVIR PHOSPHATE"          44
1144305301 "Mycophenolate Mofetil"           50 1457353500 "OSELTAMIVIR PHOSPHATE"          13
1700958659 "Nitrofurantoin"                  19 1124219092 "ROSUVASTATIN CALCIUM"           50
1477538734 "Nitrofurantoin Macrocrystal"     16 1043220957 "ROSUVASTATIN CALCIUM"           40
1225023872 "Nystatin"                        18 1720011380 "ROSUVASTATIN CALCIUM"           48
1164404232 "Ofloxacin"                       11 1467426122 "ROSUVASTATIN CALCIUM"           23
1700958659 "Oseltamivir Phosphate"           17 1659324820 "ROSUVASTATIN CALCIUM"           88
1770555922 "Penicillamine"                   12 1447474945 "ROSUVASTATIN CALCIUM"           27
1417924309 "Proparacaine Hcl"                24 1225030547 "ROSUVASTATIN CALCIUM"           36
1891993317 "Rosuvastatin Calcium"            12 1134231822 "ROSUVASTATIN CALCIUM"          125
1235102187 "Scopolamine"                     19 1205863701 "SPIRONOLACTONE"                 57
1477581320 "Sodium Chloride"                 50 1659322899 "SPIRONOLACTONE"                 82
1235102187 "Sodium Chloride 0.45 %"          12 1255443891 "SPIRONOLACTONE"                 27
1881791820 "Sodium Chloride 3 %"             45 1962404780 "SPIRONOLACTONE"                 29
1396812202 "Sodium Chloride For Inhalation"  18 1699042457 "SPIRONOLACTONE"                 36
1619134244 "Sodium Chloride Irrig Solution"  12 1467550095 "SPIRONOLACTONE"                 51
1467420893 "Sodium Polystyrene Sulfonate"    12 1619921194 "SPIRONOLACTONE"                 14
1225023872 "Spironolactone"                  57 1366684714 "SPIRONOLACTONE"                 11
1013964634 "Temazepam"                       14 1659469856 "VALSARTAN"                      11
1952310666 "Thiothixene"                     20 1780797837 "VALSARTAN/HYDROCHLOROTHIAZIDE"  33
1952310666 "Trifluoperazine Hcl"             31 1992771000 "VALSARTAN/HYDROCHLOROTHIAZIDE"  39
1437266202 "Valsartan"                       35 1780810572 "VALSARTAN/HYDROCHLOROTHIAZIDE"  15
1952395717 "Valsartan/Hydrochlorothiazide"   34          . ""                                .
end