BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Tuesday, November 30, 2021

Assign value to a categorical variable between two limits

Dear users, hope everyone is well.

i want to assign value of 3 to the variable "dum" when the current year "survival" not equal zero and it become zero in the next period.

i tried the following command but it assign the value only in the next period (no matter the next period "survival" is equal zero or not).

Code:

bys stkcd: replace dum = 3 if survival != 0 & F.survival == 0
is there any solution command for this?

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double stkcd float(survival year dum) 1 0 1990 2 1 0 1991 2 1 0 1992 2 1 0 1993 2 1 0 1994 2 1 0 1995 2 1 0 1996 2 1 0 1997 2 1 0 1998 2 1 0 1999 2 1 0 2000 2 1 0 2001 2 1 0 2002 2 1 0 2003 2 1 0 2004 2 1 0 2005 2 1 0 2006 2 1 0 2007 2 1 0 2008 2 1 0 2009 2 1 0 2010 2 1 0 2011 2 1 0 2012 2 1 0 2013 2 1 0 2014 2 1 0 2015 2 1 0 2016 2 1 0 2017 2 1 0 2018 2 1 0 2019 2 1 0 2020 2 2 0 1991 2 2 0 1992 2 2 0 1993 2 2 0 1994 2 2 0 1995 2 2 0 1996 2 2 0 1997 2 2 0 1998 2 2 0 1999 2 2 0 2000 2 2 0 2001 2 2 0 2002 2 2 0 2003 2 2 0 2004 2 2 0 2005 2 2 0 2006 2 2 0 2007 2 2 0 2008 2 2 0 2009 2 2 0 2010 2 2 0 2011 2 2 0 2012 2 2 0 2013 2 2 0 2014 2 2 0 2015 2 2 0 2016 2 2 0 2017 2 2 0 2018 2 2 0 2019 2 2 0 2020 2 3 0 1991 2 3 0 1992 2 3 0 1993 2 3 0 1994 2 3 0 1995 2 3 0 1996 2 3 0 1997 2 3 0 1998 2 3 0 1999 2 3 0 2000 2 3 0 2001 2 4 0 1991 1 4 0 1992 1 4 0 1993 1 4 0 1994 1 4 0 1995 1 4 0 1996 1 4 0 1997 1 4 0 1998 1 4 0 1999 1 4 0 2000 1 4 0 2001 1 4 0 2002 1 4 0 2003 1 4 0 2004 1 4 0 2005 1 4 0 2006 1 4 598120 2007 . 4 64400 2008 3 4 0 2009 1 4 0 2010 1 4 0 2011 1 4 0 2012 1 4 0 2013 1 4 0 2014 1 4 53908.8 2015 . 4 65308.84 2016 . 4 4000 2017 . 4 7908000 2018 3 end

Counting Observations in Panel Data and filling in missing values

Hello everyone,
I am new to STATA and have a question regarding the preparation of my Data sample. "PERMCO" shows the respective firm id, "year" shows the year in which an M&A took place. There is two things i want to do with the data:

1.: I would like to count the number of observations for each PERMCO per year, so that at the end, for example, for PERMCO "10078" instead of four times, I have the value "2002" listed there only once, and behind it, in a new variable, information is given about the frequency of the observation 2002.

2.: I would like to add the years missing in the interval 2000-2010 for each PERMCO group. The missing values vary throughout the Dataset and are not the same missing years for every PERMCO. Since no observation has taken place in these missing values, have them output with a number of "0" according to the logic explained in point 1

I kindly ask for your help in creating a code for these concerns.

Original File:
Array

What I would like it to look like (example for PERMCO 10078 representative for the entire data set):

Array

I hope someone can help me. With best regards

Christopher

post hoc for Fisher's exact test

Hello. Hope everyone is doing well.

I would like to do a post hoc test after Fisher's exact test.
I think Bonferroni correction can be done, but I have no idea of how to do it.

my current command for Fisher's exact test is as below.

tab A B, exact

Does anyone know what can I do for the post hoc test?
Thank you in advance.

Sincerely,
seonyeong

Loop to merge multiple dta files

Hi all,

I am trying to run a loop to merge 10 files names year_1, year_2, ...., year_10 all at once.
Each dataset has 4 variables - ID, grades, total_score, time (denoting year number 1, 2, ....,10). Variable ID is same for all datasets.

I am using

Code:

 local myfilelist : dir . files "*.dta"
use year_1  
foreach f of local myfilelist {
merge 1:1 ID using `f'
drop _merge
save merged, replace }

However, it is not working. There are no errors. The output data "merged" is same as year_1. What could be going wrong? Any help is much appreciated. Thank you.

Counting Observations in Panel Data and filling in missing values

Creating subset in STATA

Dear Madam/Sir,

I run the following regression and want to use only observations (77,626) that regression is run to generate descriptive statistics. The original dataset is over 100,000 observations.

regress ln_change_sga ln_change_sale ln_bw lnbw_change_sale uncertain gdprate sec_dec asset_int EMP_int uncertian_change_sale gdpg_change_sale sec_dec_sale asset_int_sale emp_int_sale i.sic2 i.fyear, robust cluster(gvkey)

Number of obs = 77,626

Is the following command the only way to use 77,626 observation? It also generates invalid syntax error.
drop if ln_change_sga==. |ln_change_sale ==.| ln_bw==.| lnbw_change_sale==.| uncertain==.| gdprate==.| sec_dec==.| asset_int==.| EMP_int==.| uncertian_change_sale==.| gdpg_change_sale==.| sec_dec_sale==.| asset_int_sale==.| emp_int_sale==.|

invalid syntax

Any help will be highly appreciated.

Joon

Reshaping long to wide problems

Hello everyone,

I am dealing with a large national database (TQIP) which has a series of variables (ICD and AIS codes) stored in long datasets such as:

inc_key icdprocedurecode icdprocedurecode_biu icdprocedureversion proceduremins proceduredays
1.800e+11 BW28ZZZ ICD10 145 1
1.800e+11 B030ZZZ ICD10 3149 3
1.800e+11 BR29ZZZ ICD10 165 1
1.800e+11 BW251ZZ ICD10 151 1
1.800e+11 BR20ZZZ ICD10 161 1
1.800e+11 BR27ZZZ ICD10 164 1
1.800e+11 BR29ZZZ ICD10 10 1

1.900e+11 BW28ZZZ ICD10 9 1
1.900e+11 BR20ZZZ ICD10 9 1
1.900e+11 BW25YZZ ICD10 10 1
1.900e+11 BW40ZZZ ICD10 5 1
1.900e+11 BR27ZZZ ICD10 10 1
1.900e+11 0HQ0XZZ ICD10 40 1
1.900e+11 BW28ZZZ ICD10 30 1
1.900e+11 B24CZZZ ICD10 1

Essentially, I have one common variable i (inc_key) to use in reshape but there is no unique j variable to use with reshape such as year or numbering of any kind since the order of the ICD 10 codes does not matter for this analysis-- I just need them separated and categorized into a single inc_key number for each (so icdcode1, icdcode2, etc. for each unique inc_key number), but I'm not sure how to do this.

I hope this wasn't confusing.

Collapse/aggregate on hourly level

Hello everyone,

I have a dataset reporting diesel price changes for different petrol stations (station_uuid) during one day with the following structure:

Code:

clear
input str36 station_uuid str22 raw_date double formatted_date float diesel
"2471ee14-8beb-455f-942d-73733d462c01" "2021-01-01 07:49:08+01" 1925106548000 1.329
"21a92daf-dec2-4448-b128-7f764b234dbc" "2021-01-01 10:27:17+01" 1925116037000 1.189
"b21f117f-305e-44ee-87cc-7f21fe4f3f58" "2021-01-01 10:42:17+01" 1925116937000 1.319
"096eb876-1888-4ec0-b64d-8d2369a319eb" "2021-01-01 11:04:14+01" 1925118254000 1.199
"25d5e86b-fc8b-479a-434e-3e852f2aefe2" "2021-01-01 11:22:16+01" 1925119336000 1.189
"5e40a39a-f679-480a-aaac-81754a28e003" "2021-01-01 13:17:14+01" 1925126234000 1.169
"65d95b09-e1cb-454f-b3da-49876ad84561" "2021-01-01 14:05:06+01" 1925129106000 1.249
"51d4b70e-a095-1aa0-e100-80009459e03a" "2021-01-01 15:53:07+01" 1925135587000 1.179
"c08111e0-37bb-4c2c-94bb-1e6e1c4bb1f5" "2021-01-01 16:23:07+01" 1925137387000 1.219
"0c469754-2608-4e58-8ab7-6cd924edd5a5" "2021-01-01 16:27:13+01" 1925137633000 1.199
"c1b456c8-b782-41d8-a960-466c4088a463" "2021-01-01 16:38:15+01" 1925138295000 1.199
"ade023de-dad0-40e7-bcc3-0a25e9a85c77" "2021-01-01 16:55:06+01" 1925139306000 1.219
"b457a782-e3d4-4513-a258-c4d3e199d79a" "2021-01-01 17:24:16+01" 1925141056000 1.239
"826b3acc-d800-41ba-9ead-76bc0d1dba20" "2021-01-01 18:22:16+01" 1925144536000 1.259
"812ecef1-650c-4930-a88b-71b9793607e6" "2021-01-01 18:31:07+01" 1925145067000 1.219
"16941d49-7ec6-45fc-aea9-93901f8f2dff" "2021-01-01 20:17:13+01" 1925151433000 1.179
"e1a15081-254f-9107-e040-0b0a3dfe563c" "2021-01-01 20:18:14+01" 1925151494000 1.179
"0cc777e4-13bc-48d4-b161-0f23d56afea6" "2021-01-01 21:12:20+01" 1925154740000 1.109
"e95bfbba-f829-45f7-ac4b-9bcab2fb48ee" "2021-01-01 21:17:13+01" 1925155033000 1.209
"51d4b5ee-a095-1aa0-e100-80009459e03a" "2021-01-01 21:42:15+01" 1925156535000 1.169
end
format %tcDD_Mon_CCYY_HH:MM:SS formatted_date

The variable formatted_date has been generated after studying chapter 25 of the user's guide and using the following command:

Code:

gen double formatted_date = clock(raw_date, "YMDhms#")

Since I want an hourly panel, I thought of collapsing the data on an hourly level forming means if there is more than one price change per hour per station. In a second step, I would then fill down/expand the observations with the price of the previous hour for stations that have not reported a price change in given hour. I imagine the second step to look something like this:

Code:

tsset station_uuid final_date
tsfill
bysort station_uuid: carryforward diesel, gen(diesel_complete)

My questions are:
1) How do I collapse/aggregate on an hourly level?
2) Is the procedure I have in mind correct or is there a better way to do it?

Thank you so much in advance.

Best regards,
Benedikt Franz

Stata technique for CPPML

Hello,

does anyone have practical experience in Stata with Constrained Poisson Pseudo Maximum Likelihood Estimation (CPPML) by Pfaffermayr (2020), especially for gravity models based on trade?

Thank you in advance!

Best,
Daniel

How to sum variable by year for specific group id

The original dataset has the following structure:

group_id | year | varx
1 1998 5
1 1998 5
1 1999 2
2 1998 1
2 1998 1
2 1998 1
2 1999 3

The outcome should look like:

group_id | year | varx
1 1998 10
1 1999 2
2 1998 3
2 1999 3

Thank you for your suggestions!

Question about stsplit

Hi all,

I would like to write a command to stsplit a dataset thus:

Code:

stsplit newvar, at(datelist)

instead of

Code:

stsplit newvar, at(numlist)

The reason is that I have a large number of dates to stsplit on, and it is an extra step to convert dates into a number before typing them into code. The following does not work, unfortunately

Code:

stsplit newvar, at(d(31Jan2017))

Is there a workaround that I can use? Alternatively, is there a quick way I can calculate the number represented by a date to help my efficiency?

Thanks

MM

xtabond2

Hello everyone,

I am currently working with dynamic panel data. The model I want to estimate using the Arellano-Bond estimator is the following:
logYi,t = αi + β0Si,t + γlogYi,t−1 + β1Si,t−1 + εi,t

where Yi,t is the logarithm of GDP per worker in country i (lrgdpw) in year t and S is educational attainment in years (edu25). We treat education as strictly exogenous. The variable t in the dataset identifies the year.

How exactly do I use the xtabond2 command to estimate the model? As in this case, I already have the lags of the variables in my model, I am unsure how I should use the command.

Thank you in advance!

BR,

Bruno

Multilevel model (binary outcome) with spatial weight matrix (not panel data)

Hi Stata forum users,

Does stata have any option to incorporate a spatial weight matrix with melogit or meqrlogit? I can create the spatial weight matrix, but I can't figure out if it is usable in multilevel models in stata when my data is not panel. I believe the HLM software allows for the incorporation of a spatial dependence matrix and I'm trying to determine if the equivalent exists in stata, specifically for the melogit or meqrlogit commands.

Any insight is much appreciated!

How to balance an unbalanced panel on the year variable?

Hello everyone!

I am fairly new to Stata and am unable to solve (perhaps) very basic problems. I am working with a panel data for the first time, and the dataset has observations about schools, educational attainment of children with their gender, total enrolment, school localities, school types, etc.

In one of the exercises, I am required to balance my dataset on the year variable such that it has the same set of schools in every year across all the available years. Honestly speaking, I do not know what this means. After browsing through the internet for quite some time, I was only able to get this far:

xtset year

I then tried to generate a new variable from the school_name so as to balance the year variable such that it has the same set of schools every year. I did this: egen s_id = group(school_id)

But I do not know how to go on further from here. I also don't know if whatever I am doing is correct.

Could someone please help me? A very small portion of my dataset (with year and school name) looks like this:

emiscode old_emis year school_name

32120046 32120046 2004 GES BASTI AZEEM
32120046 32120046 2005 GES BASTI AZEEM
32120046 32120046 2006 GES BASTI AZEEM
32120046 . 2007 GES BASTI AZEEM
31220059 31220059 2004 GES BASTI DOCTOR MUNEER
31220059 31220059 2005 GES BASTI DOCTOR MUNEER
31220059 31220059 2006 GES BASTI DOCTOR MUNEER
31220059 . 2007 GES BASTI DOCTOR MUNEER
32110081 32110081 2004 GES BASTI FAUJA
32110081 32110081 2005 GES BASTI FAUJA
32110081 32110081 2006 GES BASTI FAUJA
32110081 . 2007 GES BASTI FAUJA
32110078 32110078 2004 GES BASTI JAM
32110078 32110078 2005 GES BASTI JAM
32110078 32110078 2006 GES BASTI JAM
32110078 . 2007 GES BASTI JAM

Fractional logit model for proportions over time

Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)

I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.

Questions:

1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.

2) better to include year as c.year or i.year? The plots look quite different.

I am using Stata 14.

Thank you!!!

Code:

******************************DATA
 dataex di_rev year_r

----------------------- copy starting from the next line ---------------------
> --
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(di_rev year_r)
.34123 1
.35147 2
.36345 3
.37255 4
.39094 5
.39714 6
.39895 7
end

------------------ copy up to and including the previous line ----------------
> --

Listed 7 out of 7 observations

Code:

************************************OPTION 1: WITH CONTINUOUS YEAR

. fracreg logit di_rev c.year_r

Iteration 0:   log pseudolikelihood = -5.3012582  
Iteration 1:   log pseudolikelihood = -4.6198733  
Iteration 2:   log pseudolikelihood = -4.6196722  
Iteration 3:   log pseudolikelihood = -4.6196722  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(1)      =     163.74
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
       _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
------------------------------------------------------------------------------

. quietly margins, at(year_r=(1(1)7))

. marginsplot

  Variables that uniquely identify margins: year_r

Array

***********************************OPTION 2: WITH CATEGORICAL YEAR:

Code:

.
. fracreg logit di_rev i.year_r
note: 7.year_r omitted because of collinearity

Iteration 0:   log pseudolikelihood = -5.3011755  
Iteration 1:   log pseudolikelihood = -4.6196655  
Iteration 2:   log pseudolikelihood = -4.6194615  
Iteration 3:   log pseudolikelihood = -4.6194615  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(0)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |
          2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
          3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
          4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
          5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
          6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
          7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
             |
       _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
------------------------------------------------------------------------------

. quietly margins i.year_r

. marginsplot

  Variables that uniquely identify margins: year_r

Array

----------------------------------------------------------------------------------------------------------

FYI, DIVERSITY INDEX EQUATION BELOW:

Diversity Index Equation

DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)

H is the proportion of the population who are Hispanic or Latino.

W is the proportion of the population who are White alone, not Hispanic or Latino.

B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.

AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.

Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.

NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.

SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.

MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.

Source: https://www.census.gov/library/visua...20-census.html

Fractional logit model for proportions over time

Code:

******************************DATA
 dataex di_rev year_r

----------------------- copy starting from the next line ---------------------
> --
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(di_rev year_r)
.34123 1
.35147 2
.36345 3
.37255 4
.39094 5
.39714 6
.39895 7
end

------------------ copy up to and including the previous line ----------------
> --

Listed 7 out of 7 observations

Code:

************************************OPTION 1: WITH CONTINUOUS YEAR

. fracreg logit di_rev c.year_r

Iteration 0:   log pseudolikelihood = -5.3012582  
Iteration 1:   log pseudolikelihood = -4.6198733  
Iteration 2:   log pseudolikelihood = -4.6196722  
Iteration 3:   log pseudolikelihood = -4.6196722  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(1)      =     163.74
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
       _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
------------------------------------------------------------------------------

. quietly margins, at(year_r=(1(1)7))

. marginsplot

  Variables that uniquely identify margins: year_r

Array

***********************************OPTION 2: WITH CATEGORICAL YEAR:

Code:

.
. fracreg logit di_rev i.year_r
note: 7.year_r omitted because of collinearity

Iteration 0:   log pseudolikelihood = -5.3011755  
Iteration 1:   log pseudolikelihood = -4.6196655  
Iteration 2:   log pseudolikelihood = -4.6194615  
Iteration 3:   log pseudolikelihood = -4.6194615  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(0)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |
          2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
          3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
          4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
          5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
          6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
          7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
             |
       _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
------------------------------------------------------------------------------

. quietly margins i.year_r

. marginsplot

  Variables that uniquely identify margins: year_r

Comparing age adjusted mortality rates

Hello everyone.

I am trying to compare age-adjusted mortality rate for two population. Rural age adjusted mortality rate is 50 per 10,00,000 (95% CI 49-51) and Urban age adjusted mortality rate is 40 per 10,00,000(95% CI 39-41).As their CIs are not overlapping, the difference should be statistically significant. But is there anyway, I can obtain p value in STATA. Can I obtain rate ratio or risk ratio or any other test of difference, like test of proportions? I have absolute counts of death and population. I have tried -csi-, but that gives the comparison of crude rates only using the absolute numbers.

Thank you in advance.

Drop variables based on name

Hi, I have a dataset with 100 variables, 50 of which are called Xi_ante and 50 of which are called Xi_post.

I want to delete those with "_post" from my dataset.

I tried using the following command:

Code:

global variables [all variables Xi without _ante or _post]

foreach `var' of global variables{
drop `var'_post
}

But I get a syntax error. Is there any way to tell stata to drop the variables that contain "_post" in their name?

Thank you!

Logit & conformability error in inteff

I am using a survey data to run a logit regression, and currently I am testing the interactions of two continuous variables. I read Norton and Ai's paper that using the command Inteff helps correct coefficients and s.e. of interaction terms in nonlinear model. My logit models controls for 133 industry dummies. While the inteff commands runs when I include only 82 out 133 industry dummies, it fails to run when I include the 133 industry dummies. The error I get when I run the logit model with 133 industries dummies is r(503) comformability error. According to a post by Marteen Buis on probit & conformability error in inteff could occur when -logit- dropped a variable due to multicolinearity and -inteff- did not accurately pick that up. This is not the case for my model, because the logit model I am using does not drop any variable. My question is how I can make the command inteff run with the 133 industry dummies not just the 82 dummies. More specifically, what can be done to solve the conformability error when using 133 industry dummies if the problem is not related to observations being dropped by the logit and not picked by inteff?

Instrumental variable analysis with multiple imputation

Hello all,

I will please like to find out if IV analysis works with imputed data (i.e using multiple imputation).
If it does, then how do we test for endogeneity and use other post estimation commands (e.g estat overid, estat first stage) as i am finding it difficult to see how to do these.
If there's no way to do these can i please have a reference to say that IV does not work under multiple imputation.
I will really appreciate a quick response.

Many thanks.

Stata dropping interacting Time Period Covariates. Can't figure out why.

Good Morning.

I am trying to run a simple descriptive regression on a year-district (1964-2002, ~400 districts) panel:

reg Y Democracy##i.President i.Year i.District

Democracy = 0 if Year is in range 1970-1992,
Democracy = 1 if Year is in range 1964-1969 or 1993-2002

President = President A if Year in range 1964 - 1978
President = President B if Year in range 1979 - 2002

Both regimes go through autocratic and democratic periods, hence the interaction above. Democracy and President vars do not vary across districts (i.e. sample is one country).

Once I add Year FEs the interaction term (Democracy == 1 )*(President== President B) drops out, i.e. Democracy#i.President

It's not obvious to me where the collinearity occurs. Have also tried using areg and xtreg with the same result. Any diagnosis/solutions would be much appreciated.

Get name of specific variable which name is equal to another value of a variable

I have a study where I am to look 28 days back in time before an event to see if the patients were working or not working.
Each patient has a different event date (i.e. var date_ascd).

Relevant variables:
y_yyww (year and week, e.g. y_1410 means year 2014 and week 10. Format %10.0g and type int)
date_ascd (date format %tdD_m_Y, type long)
status_onemonthprior (date format %td, type float)
bbcombination (str6, %9s)

The variable y_yyww tells me if the patient was working or not prior to the event using different values (e.g. value 299 means the patient received sick leave pay). If the patient has a variable with the name y_1410 with the value of 299 it means the patient received sick leave pay in 2014, week 10. This variable is reported weekly. The variable y_yyww reaches a maximum of 52 weeks.

The var status_onemonthprior tells me the date 4 weeks prior to the event date.

The var bbcombination is a var calculated using the date from status_onemonthprior and we end up with string values of e.g. y_1410 if we want to look through the y_yyww variable using that name.

My issue
I would like go through all of my y_1401-y_1501 variables and see if any of the variables has the same name as bbcombination's value. If that is the case then I would like to generate a new variable called y_onemonthprior that has the same value as the value that has the same name as bbcombination's value.

Meaning, I would end up with a new variable called y_onemonthprior with the value of e.g. 299. This value can differ from patient to patient according to work status.

Is this in any way possible?

IV estimation (2SLS) with slope dummy variable interacting with endogenous variable.

Hi, I want to estimate a regression with a dummy variable, say D. My dependent variable is Y. My exogeneous indenpendent variables are X, Z and my dummy D. I have an endogenous variable W. My dummy variable is both intercept and slope dummy which means I have to multiply D with W (my endogenous variable.). My data is time series. Without endogeneity problem the code would be:

Code:

reg Y D X Z  i.D#c.W

I have two instruments for W: T and H. Now, my question is whether I wrote the code properly for 2SLS. My code is:

Code:

 ivreg2 Y  X Z D  ( i.D#c.W =  i.D#c.T  i.D#c.H )

Thank you.

Demet

Cross sectional dependency in panel data

Hello,

I am working on a panel data of six Emerging economies with quarterly data from 2000-2020. Some of the variables are cross sectionally invariant. for example: US interest rate which has same data across time for all the countries in the panel. I would like to control for cross sectional dependency in the data (wherein a variable across economies tend to be driven by common external disturbance) for which demeaning the data (subtracting by time means) is the most suggested solution. I am doing FMOLS estimation in the paper.

However, because of cross sectional invariant nature of few variables, the data cannot be demeaned. I would really appreciate if anyone can suggest a possible solution to this? Is it not possible to include cross sectionally invariant variables in a panel setting? if possible, then how to control for cross sectional dependency.

I hope my query is clear.

Thank you!

keep the last observation(s) in each year

Dear All, Suppose that the data set is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 Stkcd str10 riqi byte(Position Changtyp Edca Dimreas Dimage) double Years byte Entele
"000001" "2002-04-30" 1 1 4  . 56  3.17 .
"000001" "2003-09-04" 1 2 4  .  .     . 1
"000001" "2003-10-16" 2 1 4  3 52     6 .
"000001" "2003-10-16" 2 2 4  .  .     . 1
"000001" "2004-12-14" 1 1 4  3 53     1 .
"000001" "2004-12-14" 1 2 5  .  .     . 2
"000001" "2004-12-14" 2 1 4  3 42     1 .
"000001" "2004-12-14" 2 2 4  .  .     . 2
"000001" "2005-05-16" 1 1 5  5 62    .5 .
"000001" "2005-05-16" 1 2 3  .  .     . 1
"000001" "2005-06-17" 1 2 3  .  .     . 1
"000001" "2005-06-17" 1 1 3 12 62    .1 .
"000002" "2006-02-11" 2 1 4  5 53   1.1 .
"000002" "2007-02-07" 2 2 2  .  .     . 1
"000002" "2010-06-29" 1 1 3  5 68     5 .
"000002" "2010-06-29" 1 2 2  .  .     . 1
"000002" "2010-06-29" 2 1 2  1 62  3.33 .
"000002" "2010-06-29" 2 2 .  .  .     . 1
"000002" "2010-10-13" 2 1 . 12 54   .25 .
"000002" "2010-10-13" 2 2 .  .  .     . 1
"000002" "2012-11-21" 2 1 .  8 56  2.08 .
"000002" "2012-11-21" 2 2 5  .  .     . 2
"000002" "2012-11-21" 1 1 .  8 64  2.42 .
"000002" "2012-11-21" 1 2 2  .  .     . 2
"000003" "2016-10-20" 2 1 5  5 59  3.92 .
"000003" "2016-10-20" 2 2 4  .  .     . 1
"000003" "2016-11-06" 1 1 2  3 63  3.96 .
"000003" "2016-11-07" 1 2 4  .  .     . 1
"000003" "2016-12-10" 1 1 4 12 45   .09 .
"000003" "2016-12-10" 1 2 5  .  .     . 2
"000003" "2016-12-10" 2 1 4 12 54   .14 .
"000003" "2016-12-10" 2 2 4  .  .     . 1
"000003" "1999-02-08" 2 1 .  9 48     8 .
"000003" "1999-02-08" 2 2 3  .  .     . 1
"000004" "2001-02-15" 2 1 3  7 49     2 .
"000004" "2001-02-15" 2 2 4  .  .     . 1
"000004" "2002-06-12" 1 2 3  .  .     . 1
"000004" "2017-06-29" 1 1 3  3 66 15.06 .
"000004" "2017-06-30" 1 2 4  .  .     . 1
end

For each Stkcd and each year, I'd like to keep the last observation(s) occurred in each year.

Note that, if there are more than one observation with the same last date, keep them all. Thanks.

loop for missing values

Hi everyone! I am trying to complete a database in form of a "tree", adding as values the last available observation when a new branch exists. This is, for level 1 I need all the observations ( I already got this), but for the subsequent levels I just need to complete those obvservations for which the next level is not missing. Eg: I should have all observations marked with X:

id1	name1	cod1	id2	name2	cod2	id3	name3	cod3	id4	name4	cod4
X1	Hotel1	879
X	X	X	X2	Hotel2	456
X	X	X	X	X	X	X31	Hotel31	447
X	X	X	X	X	X	x32	Hotel32	775
X	X	X	X	X	X	X33	Hotel33	656
X	X	X	X	X	X	X	X	X	X4	Hotel4	894

I encoded all variables. Since I have 8 levels, I got the correct result by using 6 loops like this (one for each level and running them from the 7th to 1)

foreach var of varlist *7{
replace `var'=`var'[_n-1] if id8!=.
}

and another for the first level:
foreach var of varlist *1{
replace `var'=`var'[_n-1] if `var'==.
}

What I am trying now is to simplify the code by unifying the 6 loops only into 1, but i can´t find out how. I have tried something like this but the result is not correct.

foreach var of varlist id2-cod7{
local numvar = substr("`var'", -1,.)
rename `var' `var'_`numvar'
replace `var'_`numvar' = `var'_`numvar'[_n-1] if `var'_`numvar'+1!=.
}

I would really appreciate if anyone could give a hint of how to proceed!

Esttab: Store value of matrix and display in tex-table

Dear All,

I am encountering the following issue. I want to compute a mean after a regression with the mean function ("sum" does not work with svy), and then retrieve & store the mean value to export it as tex table.
In fact, I would like to add the values to the "stats" in the lower part of the table. My code/my first tries look like this:

Code:

svy: reg y1 x1 x2 x3
estimates store t1
svy: mean y1 if e(sample)==1
estadd matrix m = e(b)
estadd scalar m1 = e(b)[1,1]
estadd local cFE \ding{51}
estadd local sFE \ding{51}
estadd local csFE \ding{51}

svy: reg y1 x1 x2 x4
estimates store t2
svy: mean y1 if e(sample)==1
estadd matrix m = e(b)
estadd scalar m1 = e(b)[1,1]
estadd local cFE \ding{51}
estadd local sFE \ding{51}
estadd local csFE \ding{51}

svy: reg y1 x1 x2 x5
estimates store t3
svy: mean y1 if e(sample)==1
estadd matrix m = e(b)
estadd scalar m1 = e(b)[1,1]
estadd local cFE \ding{51}
estadd local sFE \ding{51}
estadd local csFE \ding{51}

esttab t1 t2 t3 using "test.tex", booktabs fragment replace ///
    se(%3.2f) b(2) label alignment(S S S S) ///
    star(* 0.10 ** 0.05 *** 0.01) nonotes nomtitles ///
    stats(m1 cFE sFE csFE N, fmt(%3.2f 0 0 0 0)

But this code does not work, I do not manage to display the stored numerical value of the mean of each regression in the stats of the table. Might anyone have an idea by what it is caused?

Many thanks in advance!

R2 for xtnbreg model

Hello everyone,

i am currently analyzing a count variable with the -xtnbreg- command.
This command does not show me the R2 of my model, but it would be quite helpful for the interpretation of my analysis.
Is there a chance to get the R2 for my count variable by unsing a specific command?

Best regards
Jana

Missing values

Hello, fellow stata lovers!

I am working on my thesis and have a dataset from the Enterprise Survey available with 241 observations. However, the observations consist of 50 missing values for my dependent variable and two missing values in one of my control variables. I have assumed that they are missing completely at random since the observations with missing values do not systematically differ from the other observations. Hence I have chosen to ignore the missing values.
Is this justified? Or should I handle them in a different matter?

Best,
Klaudia

Displaying Significance Stars in combined summary and correlations table

Dear Forum,

i merged my correlations and summary statistics to one table via the below code (thanks again to the forum) and now would like to display significance stars (ideally for p value of 0.1, 0.05 and 0.01). I think i should add ", star(.1 .05 .01)" somewhere, but when i e.g. added it right after the pwcorr command it threw me an error . Where would i need to change my code?

Code:

** Set variables used in Summary and Correlation
local variables dropsize femaleCEO CEOtenure CEOduality CEOage FirmSize PotentialSlackResources PercentageFemaleDirectors ROA_2007Q3 if min_quarter<LeftasCEO
local labels `" "Dropsize (in %-points)" "Female CEO" "CEO Tenure (log)" "CEO Duality" "CEO Age" "Firm Size (log)" "Potential Slack Resources(log)" "Percentage of Female Directors (in %)" "ROA 2007Q3 (in %)" "'

** Descriptive statistics
estpost summarize `variables'
matrix table = ( e(min) \ e(max) \ e(mean) \ e(sd) )
matrix rownames table = min max mean sd
matrix list table

** Correlation matrix
pwcorr `variables'
matrix C = r(C)
local corr : rownames C
matrix table = ( table \ C )
matrix list table

estadd matrix table = table

local cells table[min](fmt(2) label(Min)) table[max](fmt(2) label(Max))  table[mean](fmt(2) label(Mean)) table[sd](fmt(2) label(Standard Deviation))
local collab
local drop
local i 0
foreach row of local corr {
    local drop `drop' `row'
    local cells `cells' table[`row']( fmt(4) drop(`drop') label((`++i')) )
    local lbl : word `i' of `labels'
    local collab `" `collab' `row' "(`i') `lbl'" "'
}
display "`cells'"
display `"`collab'"'

esttab using SumStatCorTabH1.rtf, ///
        replace ///
        compress ///
        star(* 0.1 ** 0.05 *** 0.01) ///
        cells("`cells'") ///
        coeflabels(`collab')

Dumitrescu & Hurlin (2012) Granger non-causality test

Hello Dears,
I am trying to see the granger causality between government revenue and government spending. I am using panel data of 40 countries and 20 years dataset. Since the variables have to be stationary, I use the first difference of revenue and level value of government spending. I also balanced the panel. However, the p-value of Z-bar and the p-value of Z-bar tilde have different levels of significance. Which one should I have to choose [Z-bar / Z-bar tilde]? Does the result indicate granger causality? Thanks

xtbalance, range(2000 2019) miss( dlrev lspending )
xtgcause lspending dlrev , lags(1)

Dumitrescu & Hurlin (2012) Granger non-causality test results:
Lag order: 1
W-bar = 1.4082
Z-bar = 1.8257 (p-value = 0.0679)
Z-bar tilde = 0.8740 (p-value = 0.3821)
H0: dlrev does not Granger-cause lspending.
H1: dlrev does Granger-cause lspending for at least one panelvar	(id).

xtgcause lspending dlrev , lags(2)

Dumitrescu & Hurlin (2012) Granger non-causality test results:
Lag order: 2
W-bar = 2.8715
Z-bar = 2.7560 (p-value = 0.0059)
Z-bar tilde = 1.0146 (p-value = 0.3103)
H0: dlrev does not Granger-cause lspending.
H1: dlrev does Granger-cause lspending for at least one panelvar	(id).

interpretation of log-linear model vid interaction term

Hello!

I'm having a hard time interpreting my results from my regression which is:

Ln(sales) = beta0 + beta1G + beta2Finance + beta3G*Finance

where:
G is 1 if it is a woman and 0 if it is a man
Finance is a categorical variable going from 0-4

What does a unit change in Finance correspond to a change in sales?

best,
Klaudia

Standardizing categorical variables

Dear Statalist,

I am running a linear probability model with categorical variables. Some of the independent variables have two categories, some of them have three categories. I want to compare the relative strength of the effect of these variables on the dependent variable. Is adding the beta option to the regress command and thus standardizing the coefficients of categorical independent variables plausible in tihs context? I am hesitating whether it makes sense or not to standardize categorical variables. Would you have any resource recommendations for this topic?

Kind regards,
Elif.

Each table on a new page using asdoc

Hello everyone,

I am running DEA models and I was wondering if it is possible to start the output of each model on a new page in word using the asdoc command combined with the append option.

Thank you!
Marie-Lien

Monday, November 29, 2021

Negative error variance

Dear StataListers, what should I do if I get a negative error variance in the stimation of structural equation model or in a confirmatory factor analysis? Which are the causes of this? Can you recomended some paper which talk about this? Thank you

Count distinct values by groups

Hello everyone,

I have one question related to counting distinct values by groups. Here is an example of the data with ID, year, and the job code (i.e., job_code). What I am looking for is to create a variable (or collapse data) that shows how many jobs they have throughout the year from 1994 to 1996. For example, ID 1 had 3 jobs, ID 2 had 1 and ID 3 had 3.

Code:

input ID year job_code
1 1994 50
1 1995 53
1 1996 60
2 1994 35
3 1994 68
3 1995 60
3 1996 53
end

I am aware of the command -distinct- :

Code:

bysort ID: distinct job_code

but of course, it only lists out the number of distinct values for each ID. I do not know how to incorporate the -distinct- command with other commands to create a new variable that can show me how many jobs each ID has.

Could anyone help me with this question, please? Thank you in advance, and stay safe.

Does lowess take a long time?

I am running Stata SE/17.0 on a Windows machine with Intel i7 1.8GHz, and 32GB RAM. So not exactly top of the line, but not a "weak" computer either.

My dataset has a smidge over 200,000 observations. Through an if option in my lowess, I am limiting observations to about a smidge over 160,000.

Hit the Ctrl-D for the lowess about a half hour ago, but no indication whether Stata is still running or is hung now. If still running, I wonder how long it might end up taking...

Does lowess take a long time to run?

Panel Data

Hello
I am making a panel data model where I have the following regression:
xtreg Domestic_Health rDomestic_Health GPE_subindex i.Country # c.Year, fe vce (robust)

My question arises if when I use the interaction i.Country # c.Year do I need to put at the end of the regression faith or should I omit that and just like with i.Country # c.Year because with that I would already be using fixed effects?

Creation of Compound Interest variable

Good evening everyone,

I do have the closing prices NAV (net asset value) and I calculate the return as follows: nav_ret=ln(nav/nav[_n-1]. I try to find a way to create a new variable that compounds the interest on daily basis starting as base the value 100. In the data example variable base has to start with 100 and continue till the end. When I try to loop like this all values in base are reported as missing.

Code:

levelsof n, local(ns)
foreach x in `ns' {
replace base=base[_n-1]*nav_ret+base[_n-1]
}

Code:

input str7 ticker double nav float(datenum nav_ret n base)
""             .     .            .  1       100
"CBDJIUS" 106.12 18415    .02239362  2 102.23936
"CBDJIUS" 101.64 18420   -.01163998  3  101.0493
"CBDJIUS" 102.95 18421   .012806275  4 102.34337
"CBDJIUS" 102.52 18422 -.0041855318  5         0
"CBDJIUS" 105.34 18423    .02713531  6         0
"CBDJIUS" 105.77 18424   .004073711  7         0
"CBDJIUS" 105.55 18427 -.0020821511  8         0
"CBDJIUS" 107.76 18428    .02072176  9         0
"CBDJIUS" 107.81 18429 .00046388645 10         0
"CBDJIUS" 108.07 18430   .002408747 11         0
end
format %td datenum

In the code above the base the obsevations 2/5 are calculated manually by command:

Code:

replace base=base[_n-1]*nav_ret+base[_n-1] in 2
replace base=base[_n-1]*nav_ret+base[_n-1] in 3
replace base=base[_n-1]*nav_ret+base[_n-1] in 4

Is there any way to fix the loop?

Comparing cox proportional hazard linear and non-linear (restricted cubic spline) models using likelihood ratio test

Hi folks - I am trying to understand and figure out how to actually code/test non-linearity between spline (cox proportional hazards regression) and linear models using LR tests. I have seen this described in papers, but the actual mechanics of it in Stata are still a little unclear to me.

For example, in this paper here (https://www.ahajournals.org/doi/full...AHA.120.020718) they look at this for a diet variable and CVD and state:

"We computed restricted cubic splines with 4 knots to visually assess the shape of association between ADPQS as a continuous variable (both time- varying average and 13- year change) and risk of CVD. Statistical significance of nonlinearity (ie, curvature) was tested by comparing the spline model with the linear model, and P values of <0.05 were regarded as statistically significant nonlinear relationship between the exposure and the outcome. Statistical significance of linearity was tested by comparing the linear model to the model including only the covariates, both using likelihood ratio tests."

They then report p-values for these results in text and with a figure (below).

"A monotonic decrease in CVD risk with time-varying average APDQS (P-nonlinearity=0.12 and P-linearity<0.001; Figure A) and the 13- year change in APDQS (P-nonlinearity=0.54 and P- linearity=0.04; Figure B) was observed in restricted cubic splines." Array

I know one can compare nested models to look at prediction level with LR tests (e.g. below link)...

https://stats.idre.ucla.edu/stata/fa...test-in-stata/

Code:

logit hiwrite female read
estimates store m1
logit hiwrite female read math science
estimates store m2 lrtest m1 m2

... but how would one as per the above example paper compare a cox-regression linear model to a non-linear (restricted cubic spline) model using LR, and what is the relevance of the sentence "linearity was tested by comparing the linear model to the model including only the covariates"?

What would this Stata code look like in practice as a (quick/simple) example to compute these different p-values to compare the non-linear and linear models in this instance?

Thanks so much! I really appreciate your thoughts
Patrick

How can I create this variable?

Hi,
I have data for the total number of Corona cases and the total population. What should I do to create this ”Corona cases per 1.000.000 individuals” variable?

Plotting the slope

Good day Statalisters!

Is there a command in Stata where I can plot a curve using the lower bound and upper bound slopes I have obtained from my results. The following are the slopes;

(1)
LB: -1.255755
UB: .486735
(2)
LB: -.616485
UB: .3723716

Hoping for y'all generous help

Thank you,
Justine

Regression with propensity score

Dear all
I am trying to estimates the dynamic treatment effect while using propensity scores. I would normally use the 'teffects psmatch' command however, I'm not sure whether it matches on my dynamic treatment effects or how I can include them in my model?

fx.

teffects psmatch (bweight) (mbsmoke mmarried mage prenatal1 fbaby mbsmoke*year, probit)

Can anyone help?

Thanks!

reshaping long-long (?) data (all variables and observations in the single column)

Dear all, I have data from the IEA that looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 country str7 product int year str8 flow str5 gas str13 value
"USA"    "COAL" 2010 "ELECHEAT" "CO2"   "1857819.3000"
"USA"    "COAL" 2010 "ELECHEAT" "CO2eq" "1866946.1000"
"USA"    "COAL" 2011 "ELECHEAT" "CO2"   "1751542.3000"
"USA"    "COAL" 2011 "ELECHEAT" "CO2eq" "1760135.5000"
"USA"    "OIL"  2010 "RESIDENT" "CO2"   "63180.8000"  
"USA"    "OIL"  2010 "RESIDENT" "CO2eq" "63485.4000"  
"USA"    "OIL"  2010 "ELECHEAT" "CO2"   "36543.3000"  
"USA"    "OIL"  2010 "ELECHEAT" "CO2eq" "36646.1000"  
"USA"    "OIL"  2011 "RESIDENT" "CO2"   "50663.4000"  
"USA"    "OIL"  2011 "RESIDENT" "CO2eq" "50908.5000"  
"USA"    "OIL"  2011 "ELECHEAT" "CO2"   "29702.7000"  
"USA"    "OIL"  2011 "ELECHEAT" "CO2eq" "29783.6000"  
"RUSSIA" "COAL" 2010 "RESIDENT" "CO2"   "6170.2000"   
"RUSSIA" "COAL" 2010 "RESIDENT" "CO2eq" "6685.4000"   
"RUSSIA" "COAL" 2010 "ELECHEAT" "CO2"   "312644.4000"
"RUSSIA" "COAL" 2010 "ELECHEAT" "CO2eq" "313959.6000"
"RUSSIA" "COAL" 2011 "RESIDENT" "CO2"   "6189.5000"   
"RUSSIA" "COAL" 2011 "RESIDENT" "CO2eq" "6706.2000"   
"RUSSIA" "COAL" 2011 "ELECHEAT" "CO2"   "322049.8000"
"RUSSIA" "COAL" 2011 "ELECHEAT" "CO2eq" "323410.5000"
"RUSSIA" "OIL"  2010 "RESIDENT" "CO2"   "13141.3000"  
"RUSSIA" "OIL"  2010 "RESIDENT" "CO2eq" "13188.5000"  
"RUSSIA" "OIL"  2010 "ELECHEAT" "CO2"   "35918.7000"  
"RUSSIA" "OIL"  2010 "ELECHEAT" "CO2eq" "36032.8000"  
"RUSSIA" "OIL"  2011 "RESIDENT" "CO2"   "15983.3000"  
"RUSSIA" "OIL"  2011 "RESIDENT" "CO2eq" "16038.9000"  
"RUSSIA" "OIL"  2011 "ELECHEAT" "CO2"   "53819.4000"  
"RUSSIA" "OIL"  2011 "ELECHEAT" "CO2eq" "53992.1000"  
end

These are emissions for two countries (USA, RUSSIA) in two years (2010, 2011), for fuels (product: COAL, OIL), two sectors of the economy (flow: ELECHEAT, RESIDENT), two pollutants (gas: CO2, CO2eq).

I want to reorganize data like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 country int year double(ELECHEAT_COAL_CO2 ELECHEAT_COAL_CO2eq ELECHEAT_OIL_CO2 ELECHEAT_OIL_CO2eq RESIDENT_COAL_CO2 RESIDENT_COAL_CO2eq RESIDENT_OIL_CO2 RESIDENT_OIL_CO2eq)
"USA"    2010 1857819.3 1866946.1 36543.3 36646.1 63180.8 63485.4 260628.3 261347.5
"USA"    2011 1751542.3 1760135.5 29702.7 29783.6 50663.4 50908.5 256636.8 257344.9
"RUSSIA" 2010  312644.4  313959.6 35918.7 36032.8  6170.2  6685.4  13141.3  13188.5
"RUSSIA" 2011  322049.8  323410.5 53819.4 53992.1  6189.5  6706.2  15983.3  16038.9
end

and this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 A str19 B double(C D)
"country" "flow"                     2010      2011
"USA"     "ELECHEAT_COAL_CO2"   1857819.3 1751542.3
"USA"     "ELECHEAT_COAL_CO2eq" 1866946.1 1760135.5
"USA"     "ELECHEAT_OIL_CO2"      36543.3   29702.7
"USA"     "ELECHEAT_OIL_CO2eq"    36646.1   29783.6
"USA"     "RESIDENT_COAL_CO2"     63180.8   50663.4
"USA"     "RESIDENT_COAL_CO2eq"   63485.4   50908.5
"USA"     "RESIDENT_OIL_CO2"     260628.3  256636.8
"USA"     "RESIDENT_OIL_CO2eq"   261347.5  257344.9
"RUSSIA"  "ELECHEAT_COAL_CO2"    312644.4  322049.8
"RUSSIA"  "ELECHEAT_COAL_CO2eq"  313959.6  323410.5
"RUSSIA"  "ELECHEAT_OIL_CO2"      35918.7   53819.4
"RUSSIA"  "ELECHEAT_OIL_CO2eq"    36032.8   53992.1
"RUSSIA"  "RESIDENT_COAL_CO2"      6170.2    6189.5
"RUSSIA"  "RESIDENT_COAL_CO2eq"    6685.4    6706.2
"RUSSIA"  "RESIDENT_OIL_CO2"      13141.3   15983.3
"RUSSIA"  "RESIDENT_OIL_CO2eq"    13188.5   16038.9
end

I made these two reshaped datasets by hands in Excel for illustration.
But I have a lot of products, flows, countries, etc. It is impossible to do it for all data I need by hand.
I can't handle with reshape command for this

Could you help me, please?

Calculate shares by variable and a condition

Hi Stata Users,

I have household data and would like to calculate age specific enrollment rates i.e. share of children in at a specific age in a household that are attending school. Below is an example of data with the original (hhid -attend) and desired (age10 - age15) variables

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(hhid pid age attend) float age10 byte age11 float(age12 age13) byte age14 float age15
1 1 10 1    1 0  1  0 0  0
1 2 11 0    1 0  1  0 0  0
1 3 12 1    1 0  1  0 0  0
2 1 10 1   .5 0 .5  0 0  0
2 2 10 0   .5 0 .5  0 0  0
2 3 12 0   .5 0 .5  0 0  0
2 4 12 1   .5 0 .5  0 0  0
3 1 10 0    0 1  0  0 0  0
3 2 11 1    0 1  0  0 0  0
3 3 12 0    0 1  0  0 0  0
3 4 14 0    0 1  0  0 0  0
4 1 15 1    0 0  0  0 0 .5
4 2 15 0    0 0  0  0 0 .5
5 1 10 1 .333 0  0 .5 0  0
5 2 10 1 .333 0  0 .5 0  0
5 3 10 0 .333 0  0 .5 0  0
5 4 13 1 .333 0  0 .5 0  0
5 5 13 0 .333 0  0 .5 0  0
5 6 15 0 .333 0  0 .5 0  0
end

Thanks in advance!

Stata 17 - how to get forest plots with risk ratio and not log risk ratio

Hi
I am new to Stata 17 meta analysis forest plots.
I am analysing binary data for treatment(Yes/No) and Control (Yes/No).
I use dropdown list in Stata --> Statistics --> meta analysis --> setup --> binary data ---> entered variables with and without outcomes for treatment and control --> log risk ratio --> random effects.--> summary --> forest plot
The forest plot has log Risk ratio and not Risk ratio.
Please advise how I could have the output as Risk ratios in the forest plot (not log risk ratio).
Thanks in advance.

areg in cross-sectional data and multicollinearity

Hello, I have a cross sectional data consisting of 632 banks in 67 countries. In my dataset I have many variables with banks ratios, such as Tier 1 capital, Deposits, Loans ratios and etc (at one point in time). Following Beltratti and Stulz (2012) paper, I want to include country fixed effects and to cluster at the country level. I decide to use the following code in Stata: areg Y All_Xs, absorb(CountryID) vce(cluster CountryID) Is this is a correct code to use with my data? I'm a beginner in STATA and I read that fixed effects are normally applied in panel data, so I'm a bit confused if what I'm doing make sense. Also, I want to test for the multicollinearity. I use simple corr ALL_Xs code in STATA and I get the correlation matrix. However, I would also like to test Variance Inflation factor (VIF) to see if any of my variables are above threshold of 10. However, I can't use vif after areg regression. I know I could use command estat vce, corr but it just provides me with another correlation matrix table, and I struggle to understand should I drop some variables or not. Is it possible to test VIF with areg regression? Thanks

Regress for each company by using "foreach" or "forvalues" command

Hi there,

My panel data set includes 389 companies and 51 quarter years. I am trying to regress my regression for each company and to save coefficients. If I regress regression as manual, Stata produces coefficients. For example;

Code:

// for company 1 :

nlsur (l_q=ln({A})+({s}/(1-{s}))*(ln(({a1}*K^(({s}-1)/{s}))+((1-{a1})*((exp({d}))*L))^(({s}-1)/{s})))) (l_KY=ln({a1}/(1+{mu}))+(({s}-1)/{s})*(ln(Y/K)-ln({A}))) (l_LY=ln((1-{a1})/(1+{mu}))+(({s}-1)/{s})*(ln(Y/L)-ln({A})-{d})), initial(i), if id==1

// for company 2 :

nlsur (l_q=ln({A})+({s}/(1-{s}))*(ln(({a1}*K^(({s}-1)/{s}))+((1-{a1})*((exp({d}))*L))^(({s}-1)/{s})))) (l_KY=ln({a1}/(1+{mu}))+(({s}-1)/{s})*(ln(Y/K)-ln({A}))) (l_LY=ln((1-{a1})/(1+{mu}))+(({s}-1)/{s})*(ln(Y/L)-ln({A})-{d})), initial(i), if id==2

where id is company id and id =(1,2,....,389)

However, I do not want to do that for 389 companies. I want to use "foreach" or "forvalues" loop commands. So I am trying to run this code:

Code:

gen coeff= . // empty variable for coefficient

levelsof id, local(levels) // id is company id
foreach i of local levels{
    
    nlsur (l_q=ln({A})+({s}/(1-{s}))*(ln(({a1}*K^(({s}-1)/{s}))+((1-{a1})*((exp({d}))*L))^(({s}-1)/{s})))) (l_KY=ln({a1}/(1+{mu}))+(({s}-1)/{s})*(ln(Y/K)-ln({A}))) (l_LY=ln((1-{a1})/(1+{mu}))+(({s}-1)/{s})*(ln(Y/L)-ln({A})-{d})), initial(i), if id=='i'
    
    replace coeff= _b[A] if id=='i'
}

But I received "i invalid name" error.

What is my mistake? How can I solve this problem? Can I generate a result variable like that:

Firm ID	A	a1	s	mu
1	..	..	..	..
2	..	..	..	..

Thank you for your interest.

Error:

Array

How to change scientific notation into standard format?

Hello all, can anyone please guide me, how can I get rid of scientific notation in the summary down below?
I am interested in the full formatted number in the summary.

. sum fungicidesandbactericides

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

fungicides~s | 56 5757572 1.03e+07 66163 3.89e+07

Generate grouping variable based on various nominal variables

Dear community,

I'm working with household data containing various nominal or ordinal variables such as household type, income group, or location (see data example below)
I now want to put these households together in different groups, resulting in one group per possible combination of variable levels.
Is there a way to do this automatically without having to use various loops which would be tedious?

Thanks a lot,

Mattis

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(hhtyp2 hheink_gr1 RegioStaR7 bus28)
 3 11 77  3
 2 14 73  3
 2 10 77  3
 1  5 77 95
 3  4 73  3
 3  5 71  2
 3  9 72 95
 4  7 71  3
 4  5 77  3
 4  7 74  1
 4  6 72  1
 3  9 72  3
 2  4 72  1
 3  5 76  3
 2  7 72  1
 2  9 73  3
 3  7 75  1
 4  5 74  3
 3  5 76  1
95  5 76 95
 3  3 73  1
 2  7 76 95
 3  3 76 95
 2  9 74  5
 2  5 71 95
 3  5 77  2
 3  9 73  1
 2  7 77 95
 3  6 74  3
 3  7 72  2
 2  7 73 95
 3  6 72  1
 3  6 77  2
 4  4 71  1
 3  8 73  1
 2  9 76 95
 3  9 73  1
 3  7 71 95
 3 11 73  1
 4  5 73  2
 3  6 73  4
 3  7 73  1
 4  9 76  3
 3 13 71 95
 3  5 77  1
 3  3 76 95
 3  4 77  6
 3  7 76  1
 3 11 77 95
 4  4 72  2
 4  7 72 95
 4  4 72  1
 3 15 76  3
 2 11 74 95
 4  5 73  2
 3  5 77  4
 4  4 73  2
 4  5 77 95
 4  4 77  4
 2  5 76  1
 2  9 73  1
 4  5 73  1
 2  8 73 95
 4  8 73  1
 4  3 71  2
 4  4 77  2
 3 15 72  2
 3  3 77 95
 4  5 73 95
 2 11 73  1
 3 11 73 95
 3  9 77  5
 3  6 73 95
end

Unequal number of observations in percentile groups

Hi Stata users,

I am trying to come up with percentile groups using the code below

Code:

_pctile asset_index, nquantiles(100)
return list
forval i=1/99 {
local p`i' = (r(r`i'))
}

gen percent = .
replace percent = 1 if asset_index < `p1'
replace percent = 100 if asset_index > `p99' & asset_index < .
forval i=1/98 {
di "``i''"
local j = `i' + 1
di "lower `p`i''"
di "upper `p`j''"
replace percent = `i' + 1 if inrange(asset_index, `p`i'', `p`j'')
}
ta percent

I am noticing that there are groups with extremely few number of observations such as 3 or 5. I am not sure whether the syntax is wrong or my understanding of how percentiles are calculated is misinformed.

Am attaching the dataset since

Code:

dataex

may not be the most effective way of sharing 9097 observations

Any advice is welcome.

Thanks in advance!

Sunday, November 28, 2021

Constructing Gini Index for Household data

Dear All
I'm computing the Gini coefficient in Stata 16 with the" ineqdeco" command. I'm using panel data from the Household Surveys in 2009, 2012, and 2016 with about 59,000 observations. I'd like to estimate the Gini coefficient for each household for each year, but the Gini coefficient is only generated for Household Code (grouped variable). As a result, each household has the same Gini coefficient over all three years. However, I want to calculate the Gini index for each household_Code for each year.

For example, the output appears as:

Year Household_Code Gini
2009 1 0.6456
2012 1 0.6456
2016 1 0.6456
2009 2 0.3423
2012 2 0.3423
2016 2 0.3423

Code:

egen Household_Code = group(District Psu Snumber)
su Household_Code, meanonly
gen gini = .
program do_it
    qui ineqdeco Household_Income, by(Household_Code)
    replace gini = r(gini)
end
runby do_it, by(Household_Code) verbose

I also tried with the following code, but the error shows as"too many variables".

Code:

egen Household_Code = group(District Psu Snumber)
su Household_Code, meanonly
gen gini = .
program do_it
    qui ineqdeco Household_Income, by(Year Household_Code)
    replace gini = r(gini)
end
runby do_it, by(Year Household_Code) verbose

Could someone help me figure out the stata command to calculate the Gini index for each household_Code for each year?

estimating 5 year survival

Hi,
I have a dataset with survival months and event (dead/alive). I have been able to stset it and calculate various survival statistics from STATA menu. However, I could not find out how to estimate 5 year survival rate in my data set? Seems simple on the face of it but just cannot figure this out. Many thanks Arnaud

Creating a combined graph

Hi,

I'm struggling to get this right. I have 2 variables that I want to show on a single graph. Firstly, is % of patients who survive (currently variable is ynAlive, a binary outcome) and second is how fast they were defibrillated (DCCS, measured in whole minutes as an integer).

Ideally, the graph would show % on the y axis and minutes on the x axis. There would be two lines - line 1 showing survival % decreasing per minute increase in DCCS, and a second line showing currently what % of patients are defibrillated per minute.

graph bar (percent), over(DCCS) gives me the correct graph for the second part (although I want line not bar and when I specify line I get "linegraph_g.new (percent), over(DCCS): class member function not found"

I feel like this should be relatively simple but I just can't wrap my head around it.

Thanks all for your assistance.

poisson regression (ppmlhdfe) with multiplicative error

Is there a way to force ppmlhdfe to use multiplicative error term instead of additive?

I wan to run a poisson regression with instruments and as this thread suggests the only way it makes sense is if the error term is multiplicative:
https://www.statalist.org/forums/for...-fixed-effects

I know that the ivpoisson command can do that but I need a command that can deal with high dimentional fixed effects as well.

I appreciate any help.

OLS vs FE vs RE? Tests results conflicts.

Dear Stata specialists,

Hope you can help me solve my problem.

I recently run OLS, FE, RE for my panel data model and perform some tests to choose between the methods.

The F-test turns out that OLS is preferred to FE (F = 0.873, df1 = 62, df2 = 431, p-value = 0.7409)
LM test shows that RE is preferred to OLS (chisq = 15.505, df = 1, p-value = 8.227e-05)
Hausman test shows that FE is preferred to RE. (chisq = 38.851, df = 10, p-value = 2.696e-05)
Hausman test shows that FE is preferred to OLS. (chisq = 40.176, df = 10, p-value = 1.578e-05)

The test results puzzle me and I cannot decide which method to choose.

Thank you very much!
Andy

Detrending in Dynamic Panel Data regression

Hello,

I am running a dynamic panel data regression. However 3 out of my 7 variables are stationary at trend level.

I wanted to know if I should detrend these variables or if it is okay to run the model since I am adding lagged dependent variable as a regressor ?. Does my lagged dependent variable act as a time dummy ?

Thanks

Help with two way fixed effect and event study?

Hello,
I'm currently working on a project where I asses whether or not a program has made an impact on contraception need. The program was installed in many countries at varying years so I was told that I should have a two way FE and that an event study regression was the way to go however I'm not quite sure how to go about (1) making a two way FE and (2) how to code for an event study regression. I have this equation if it helps: Yict= α + β*FP2020ct + δc +𝛾t+ εict where i is individual c is country and t is time. If anyone has any tips or ways about doing this please let me know!
Thanks so much!

Create an index

I need to group 3 variables to build an index in Stata. Each variable refers to a question in a questionnaire. The point is that they all have different scales: one variable has a scale from 1 to 7 (1 = never, 7 = always, another variable from 1=low to 3=high, and the third 1 to 2 (it's a yes or no question). How do I build an index with these three variables?

Interpreting results: 1-standard-deviation increase in an explanatory variable

This table provides results of the analysis of the role of tax avoidance on returns. The dependent variables are Cumulative abnormal returns.

The authors state: "Economically, a 1-standard-deviation increase in tax avoidance is associated with a 0.75% (=26.2%*2.859%) more negative firm value response in Column 1".

My question is where is "26.2%" coming from? The figures in parentheses are t -statistics, and they do not report standard deviation but I assume -2.859 corresponds to one standard deviation in tax avoidance ("Tax variable"). Then why do the authors multiply the coefficient -2.859 by "26.2%"???

I'd really appreciate any help in interpreting this result.

Paper: The Value of Offshore Secrets: Evidence from the Panama Papers

Interpreting impulse response with log percentage as dependent variable and first difference of share as independent one

Hello everyone,

I estimate a VAR model where the dependent variable is log-transformed percentage (0-100) and the independent variable is first difference of a share (0-1) whereas the independent variable is not log-transformed.

I have seen similar questions here, but not specifically log-transformed percentage and first differences of share.

Can you help me to interpret the response of the dependent variable to the shock caused by the independent variable if the impulse respone would be, eg. 0.6 after 5 years?

Best
Marius

error:*convergence not achieved

Hello everyone

I am getting the error: convergence not achieved and I don't know why.

I am using this command:

Code:

local controls "fs ihs_lev rdi pcount"

xtnbreg ccit cvcie1 `controls' i.fyear, fe

and am getting this output:

Code:

. local controls "fs ihs_lev rdi pcount"

. 
. xtnbreg ccit cvcie1 `controls' i.fyear, fe
note: 5 groups (46 obs) dropped because of all zero outcomes

Iteration 0:   log likelihood =  -8277.601  (not concave)
Iteration 1:   log likelihood = -3025.1813  (not concave)
Iteration 2:   log likelihood = -2941.6164  (not concave)
Iteration 3:   log likelihood = -2936.7774  (not concave)
Iteration 4:   log likelihood = -2936.4605  (not concave)
Iteration 5:   log likelihood = -2932.1437  (not concave)
Iteration 6:   log likelihood = -2928.3599  (not concave)
Iteration 7:   log likelihood = -2927.5556  (not concave)
Iteration 8:   log likelihood = -2926.8491  (not concave)
Iteration 9:   log likelihood = -1447.2455  
Iteration 10:  log likelihood = -1178.6375  (not concave)
Iteration 11:  log likelihood = -1079.6991  (not concave)
Iteration 12:  log likelihood = -1078.2154  (not concave)
Iteration 13:  log likelihood = -1078.2013  (not concave)
Iteration 14:  log likelihood = -1078.1848  (not concave)
Iteration 15:  log likelihood = -1078.1836  (not concave)
Iteration 16:  log likelihood = -1078.1834  (not concave)
Iteration 17:  log likelihood = -1078.1832  (not concave)
Iteration 18:  log likelihood =  -1078.183  (not concave)
Iteration 19:  log likelihood = -1078.1829  (not concave)
Iteration 20:  log likelihood = -1078.1827  (not concave)
Iteration 21:  log likelihood = -1078.1825  (not concave)
Iteration 22:  log likelihood = -1078.1823  (not concave)
Iteration 23:  log likelihood = -1078.1822  (not concave)
Iteration 24:  log likelihood =  -1078.182  (not concave)
Iteration 25:  log likelihood = -1078.1818  (not concave)
Iteration 26:  log likelihood = -1078.1816  (not concave)
Iteration 27:  log likelihood = -1078.1814  (not concave)
Iteration 28:  log likelihood = -1078.1813  (not concave)
Iteration 29:  log likelihood = -1078.1811  (not concave)
Iteration 30:  log likelihood = -1078.1809  (not concave)
Iteration 31:  log likelihood = -1078.1807  (not concave)
Iteration 32:  log likelihood = -1078.1806  (not concave)
Iteration 33:  log likelihood = -1078.1804  (not concave)
Iteration 34:  log likelihood = -1078.1802  (not concave)
Iteration 35:  log likelihood =   -1078.18  (not concave)
Iteration 36:  log likelihood = -1078.1798  (not concave)
Iteration 37:  log likelihood = -1078.1797  (not concave)
Iteration 38:  log likelihood = -1078.1795  (not concave)
Iteration 39:  log likelihood = -1078.1793  (not concave)
Iteration 40:  log likelihood = -1078.1791  (not concave)
Iteration 41:  log likelihood =  -1078.179  (not concave)
Iteration 42:  log likelihood = -1078.1788  (not concave)
Iteration 43:  log likelihood = -1078.1786  (not concave)
Iteration 44:  log likelihood = -1078.1784  (not concave)
Iteration 45:  log likelihood = -1078.1783  (not concave)
Iteration 46:  log likelihood = -1078.1781  (not concave)
Iteration 47:  log likelihood = -1078.1779  (not concave)
Iteration 48:  log likelihood = -1078.1777  (not concave)
Iteration 49:  log likelihood = -1078.1775  (not concave)
Iteration 50:  log likelihood = -1078.1774  (not concave)
Iteration 51:  log likelihood = -1078.1772  (not concave)
Iteration 52:  log likelihood =  -1078.177  (not concave)
Iteration 53:  log likelihood = -1078.1768  (not concave)
Iteration 54:  log likelihood = -1078.1767  (not concave)
Iteration 55:  log likelihood = -1078.1765  (not concave)
Iteration 56:  log likelihood = -1078.1763  (not concave)
Iteration 57:  log likelihood = -1078.1761  (not concave)
Iteration 58:  log likelihood = -1078.1759  (not concave)
Iteration 59:  log likelihood = -1078.1758  (not concave)
Iteration 60:  log likelihood = -1078.1756  (not concave)
Iteration 61:  log likelihood = -1078.1754  (not concave)
Iteration 62:  log likelihood = -1078.1752  (not concave)
Iteration 63:  log likelihood = -1078.1751  (not concave)
Iteration 64:  log likelihood = -1078.1749  (not concave)
Iteration 65:  log likelihood = -1078.1747  (not concave)
Iteration 66:  log likelihood = -1078.1745  (not concave)
Iteration 67:  log likelihood = -1078.1744  (not concave)
Iteration 68:  log likelihood = -1078.1742  (not concave)
Iteration 69:  log likelihood =  -1078.174  (not concave)
Iteration 70:  log likelihood = -1078.1738  (not concave)
Iteration 71:  log likelihood = -1078.1737  (not concave)
Iteration 72:  log likelihood = -1078.1735  (not concave)
Iteration 73:  log likelihood = -1078.1733  (not concave)
Iteration 74:  log likelihood = -1078.1731  (not concave)
Iteration 75:  log likelihood =  -1078.173  (not concave)
Iteration 76:  log likelihood = -1078.1728  (not concave)
Iteration 77:  log likelihood = -1078.1726  (not concave)
Iteration 78:  log likelihood = -1078.1724  (not concave)
Iteration 79:  log likelihood = -1078.1722  (not concave)
Iteration 80:  log likelihood = -1078.1721  (not concave)
Iteration 81:  log likelihood = -1078.1719  (not concave)
Iteration 82:  log likelihood = -1078.1717  (not concave)
Iteration 83:  log likelihood = -1078.1715  (not concave)
Iteration 84:  log likelihood = -1078.1714  (not concave)
Iteration 85:  log likelihood = -1078.1712  (not concave)
Iteration 86:  log likelihood =  -1078.171  (not concave)
Iteration 87:  log likelihood = -1078.1708  (not concave)
Iteration 88:  log likelihood = -1078.1707  (not concave)
Iteration 89:  log likelihood = -1078.1705  (not concave)
Iteration 90:  log likelihood = -1078.1703  (not concave)
Iteration 91:  log likelihood = -1078.1701  (not concave)
Iteration 92:  log likelihood =   -1078.17  (not concave)
Iteration 93:  log likelihood = -1078.1698  (not concave)
Iteration 94:  log likelihood = -1078.1696  (not concave)
Iteration 95:  log likelihood = -1078.1694  (not concave)
Iteration 96:  log likelihood = -1078.1693  (not concave)
Iteration 97:  log likelihood = -1078.1691  (not concave)
Iteration 98:  log likelihood = -1078.1689  (not concave)
Iteration 99:  log likelihood = -1078.1687  (not concave)
Iteration 100: log likelihood = -1078.1686  (not concave)
Iteration 101: log likelihood = -1078.1684  (not concave)
Iteration 102: log likelihood = -1078.1682  (not concave)
Iteration 103: log likelihood =  -1078.168  (not concave)
Iteration 104: log likelihood = -1078.1679  (not concave)
Iteration 105: log likelihood = -1078.1677  (not concave)
Iteration 106: log likelihood = -1078.1675  (not concave)
Iteration 107: log likelihood = -1078.1673  (not concave)
Iteration 108: log likelihood = -1078.1672  (not concave)
Iteration 109: log likelihood =  -1078.167  (not concave)
Iteration 110: log likelihood = -1078.1668  (not concave)
Iteration 111: log likelihood = -1078.1666  (not concave)
Iteration 112: log likelihood = -1078.1665  (not concave)
Iteration 113: log likelihood = -1078.1663  (not concave)
Iteration 114: log likelihood = -1078.1661  (not concave)
Iteration 115: log likelihood = -1078.1659  (not concave)
Iteration 116: log likelihood = -1078.1658  (not concave)
Iteration 117: log likelihood = -1078.1656  (not concave)
Iteration 118: log likelihood = -1078.1654  (not concave)
Iteration 119: log likelihood = -1078.1652  (not concave)
Iteration 120: log likelihood = -1078.1651  (not concave)
Iteration 121: log likelihood = -1078.1649  (not concave)
Iteration 122: log likelihood = -1078.1647  (not concave)
Iteration 123: log likelihood = -1078.1645  (not concave)
Iteration 124: log likelihood = -1078.1644  (not concave)
Iteration 125: log likelihood = -1078.1642  (not concave)
Iteration 126: log likelihood =  -1078.164  (not concave)
Iteration 127: log likelihood = -1078.1638  (not concave)
Iteration 128: log likelihood = -1078.1637  (not concave)
Iteration 129: log likelihood = -1078.1635  (not concave)
Iteration 130: log likelihood = -1078.1633  (not concave)
Iteration 131: log likelihood = -1078.1631  (not concave)
Iteration 132: log likelihood =  -1078.163  (not concave)
Iteration 133: log likelihood = -1078.1628  (not concave)
Iteration 134: log likelihood = -1078.1626  (not concave)
Iteration 135: log likelihood = -1078.1624  (not concave)
Iteration 136: log likelihood = -1078.1623  (not concave)
Iteration 137: log likelihood = -1078.1621  (not concave)
Iteration 138: log likelihood = -1078.1619  (not concave)
Iteration 139: log likelihood = -1078.1617  (not concave)
Iteration 140: log likelihood = -1078.1616  (not concave)
Iteration 141: log likelihood = -1078.1614  (not concave)
Iteration 142: log likelihood = -1078.1612  (not concave)
Iteration 143: log likelihood =  -1078.161  (not concave)
Iteration 144: log likelihood = -1078.1609  (not concave)
Iteration 145: log likelihood = -1078.1607  (not concave)
Iteration 146: log likelihood = -1078.1605  (not concave)
Iteration 147: log likelihood = -1078.1603  (not concave)
Iteration 148: log likelihood = -1078.1602  (not concave)
Iteration 149: log likelihood =   -1078.16  (not concave)
Iteration 150: log likelihood = -1078.1598  (not concave)
Iteration 151: log likelihood = -1078.1597  (not concave)
Iteration 152: log likelihood = -1078.1595  (not concave)
Iteration 153: log likelihood = -1078.1593  (not concave)
Iteration 154: log likelihood = -1078.1591  (not concave)
Iteration 155: log likelihood =  -1078.159  (not concave)
Iteration 156: log likelihood = -1078.1588  (not concave)
Iteration 157: log likelihood = -1078.1586  (not concave)
Iteration 158: log likelihood = -1078.1584  (not concave)
Iteration 159: log likelihood = -1078.1583  (not concave)
Iteration 160: log likelihood = -1078.1581  (not concave)
Iteration 161: log likelihood = -1078.1579  (not concave)
Iteration 162: log likelihood = -1078.1577  (not concave)
Iteration 163: log likelihood = -1078.1576  (not concave)
Iteration 164: log likelihood = -1078.1574  (not concave)
Iteration 165: log likelihood = -1078.1572  (not concave)
Iteration 166: log likelihood = -1078.1571  (not concave)
Iteration 167: log likelihood = -1078.1569  (not concave)
Iteration 168: log likelihood = -1078.1567  (not concave)
Iteration 169: log likelihood = -1078.1565  (not concave)
Iteration 170: log likelihood = -1078.1564  (not concave)
Iteration 171: log likelihood = -1078.1562  (not concave)
Iteration 172: log likelihood =  -1078.156  (not concave)
Iteration 173: log likelihood = -1078.1558  (not concave)
Iteration 174: log likelihood = -1078.1557  (not concave)
Iteration 175: log likelihood = -1078.1555  (not concave)
Iteration 176: log likelihood = -1078.1553  (not concave)
Iteration 177: log likelihood = -1078.1551  (not concave)
Iteration 178: log likelihood =  -1078.155  (not concave)
Iteration 179: log likelihood = -1078.1548  (not concave)
Iteration 180: log likelihood = -1078.1546  (not concave)
Iteration 181: log likelihood = -1078.1545  (not concave)
Iteration 182: log likelihood = -1078.1543  (not concave)
Iteration 183: log likelihood = -1078.1541  (not concave)
Iteration 184: log likelihood = -1078.1539  (not concave)
Iteration 185: log likelihood = -1078.1538  (not concave)
Iteration 186: log likelihood = -1078.1536  (not concave)
Iteration 187: log likelihood = -1078.1534  (not concave)
Iteration 188: log likelihood = -1078.1532  (not concave)
Iteration 189: log likelihood = -1078.1531  (not concave)
Iteration 190: log likelihood = -1078.1529  (not concave)
Iteration 191: log likelihood = -1078.1527  (not concave)
Iteration 192: log likelihood = -1078.1526  (not concave)
Iteration 193: log likelihood = -1078.1524  (not concave)
Iteration 194: log likelihood = -1078.1522  (not concave)
Iteration 195: log likelihood =  -1078.152  (not concave)
Iteration 196: log likelihood = -1078.1519  (not concave)
Iteration 197: log likelihood = -1078.1517  (not concave)
Iteration 198: log likelihood = -1078.1515  (not concave)
Iteration 199: log likelihood = -1078.1513  (not concave)
Iteration 200: log likelihood = -1078.1512  (not concave)
Iteration 201: log likelihood =  -1078.151  (not concave)
Iteration 202: log likelihood = -1078.1508  (not concave)
Iteration 203: log likelihood = -1078.1507  (not concave)
Iteration 204: log likelihood = -1078.1505  (not concave)
Iteration 205: log likelihood = -1078.1503  (not concave)
Iteration 206: log likelihood = -1078.1501  (not concave)
Iteration 207: log likelihood =   -1078.15  (not concave)
Iteration 208: log likelihood = -1078.1498  (not concave)
Iteration 209: log likelihood = -1078.1496  (not concave)
Iteration 210: log likelihood = -1078.1495  (not concave)
Iteration 211: log likelihood = -1078.1493  (not concave)
Iteration 212: log likelihood = -1078.1491  (not concave)
Iteration 213: log likelihood = -1078.1489  (not concave)
Iteration 214: log likelihood = -1078.1488  (not concave)
Iteration 215: log likelihood = -1078.1486  (not concave)
Iteration 216: log likelihood = -1078.1484  (not concave)
Iteration 217: log likelihood = -1078.1483  (not concave)
Iteration 218: log likelihood = -1078.1481  (not concave)
Iteration 219: log likelihood = -1078.1479  (not concave)
Iteration 220: log likelihood = -1078.1477  (not concave)
Iteration 221: log likelihood = -1078.1476  (not concave)
Iteration 222: log likelihood = -1078.1474  (not concave)
Iteration 223: log likelihood = -1078.1472  (not concave)
Iteration 224: log likelihood = -1078.1471  (not concave)
Iteration 225: log likelihood = -1078.1469  (not concave)
Iteration 226: log likelihood = -1078.1467  (not concave)
Iteration 227: log likelihood = -1078.1465  (not concave)
Iteration 228: log likelihood = -1078.1464  (not concave)
Iteration 229: log likelihood = -1078.1462  (not concave)
Iteration 230: log likelihood =  -1078.146  (not concave)
Iteration 231: log likelihood = -1078.1459  (not concave)
Iteration 232: log likelihood = -1078.1457  (not concave)
Iteration 233: log likelihood = -1078.1455  (not concave)
Iteration 234: log likelihood = -1078.1453  (not concave)
Iteration 235: log likelihood = -1078.1452  (not concave)
Iteration 236: log likelihood =  -1078.145  (not concave)
Iteration 237: log likelihood = -1078.1448  (not concave)
Iteration 238: log likelihood = -1078.1447  (not concave)
Iteration 239: log likelihood = -1078.1445  (not concave)
Iteration 240: log likelihood = -1078.1443  (not concave)
Iteration 241: log likelihood = -1078.1441  (not concave)
Iteration 242: log likelihood =  -1078.144  (not concave)
Iteration 243: log likelihood = -1078.1438  (not concave)
Iteration 244: log likelihood = -1078.1436  (not concave)
Iteration 245: log likelihood = -1078.1435  (not concave)
Iteration 246: log likelihood = -1078.1433  (not concave)
Iteration 247: log likelihood = -1078.1431  (not concave)
Iteration 248: log likelihood = -1078.1429  (not concave)
Iteration 249: log likelihood = -1078.1428  (not concave)
Iteration 250: log likelihood = -1078.1426  (not concave)
Iteration 251: log likelihood = -1078.1424  (not concave)
Iteration 252: log likelihood = -1078.1423  (not concave)
Iteration 253: log likelihood = -1078.1421  (not concave)
Iteration 254: log likelihood = -1078.1419  (not concave)
Iteration 255: log likelihood = -1078.1417  (not concave)
Iteration 256: log likelihood = -1078.1416  (not concave)
Iteration 257: log likelihood = -1078.1414  (not concave)
Iteration 258: log likelihood = -1078.1412  (not concave)
Iteration 259: log likelihood = -1078.1411  (not concave)
Iteration 260: log likelihood = -1078.1409  (not concave)
Iteration 261: log likelihood = -1078.1407  (not concave)
Iteration 262: log likelihood = -1078.1406  (not concave)
Iteration 263: log likelihood = -1078.1404  (not concave)
Iteration 264: log likelihood = -1078.1402  (not concave)
Iteration 265: log likelihood =   -1078.14  (not concave)
Iteration 266: log likelihood = -1078.1399  (not concave)
Iteration 267: log likelihood = -1078.1397  (not concave)
Iteration 268: log likelihood = -1078.1395  (not concave)
Iteration 269: log likelihood = -1078.1394  (not concave)
Iteration 270: log likelihood = -1078.1392  (not concave)
Iteration 271: log likelihood =  -1078.139  (not concave)
Iteration 272: log likelihood = -1078.1388  (not concave)
Iteration 273: log likelihood = -1078.1387  (not concave)
Iteration 274: log likelihood = -1078.1385  (not concave)
Iteration 275: log likelihood = -1078.1383  (not concave)
Iteration 276: log likelihood = -1078.1382  (not concave)
Iteration 277: log likelihood =  -1078.138  (not concave)
Iteration 278: log likelihood = -1078.1378  (not concave)
Iteration 279: log likelihood = -1078.1377  (not concave)
Iteration 280: log likelihood = -1078.1375  (not concave)
Iteration 281: log likelihood = -1078.1373  (not concave)
Iteration 282: log likelihood = -1078.1371  (not concave)
Iteration 283: log likelihood =  -1078.137  (not concave)
Iteration 284: log likelihood = -1078.1368  (not concave)
Iteration 285: log likelihood = -1078.1366  (not concave)
Iteration 286: log likelihood = -1078.1365  (not concave)
Iteration 287: log likelihood = -1078.1363  (not concave)
Iteration 288: log likelihood = -1078.1361  (not concave)
Iteration 289: log likelihood =  -1078.136  (not concave)
Iteration 290: log likelihood = -1078.1358  (not concave)
Iteration 291: log likelihood = -1078.1356  (not concave)
Iteration 292: log likelihood = -1078.1355  (not concave)
Iteration 293: log likelihood = -1078.1353  (not concave)
Iteration 294: log likelihood = -1078.1351  (not concave)
Iteration 295: log likelihood = -1078.1349  (not concave)
Iteration 296: log likelihood = -1078.1348  (not concave)
Iteration 297: log likelihood = -1078.1346  (not concave)
Iteration 298: log likelihood = -1078.1344  (not concave)
Iteration 299: log likelihood = -1078.1343  (not concave)
Iteration 300: log likelihood = -1078.1341  (not concave)
convergence not achieved

Conditional FE negative binomial regression            Number of obs    =  309
Group variable: gvkey                                  Number of groups =   29

                                                       Obs per group:
                                                                    min =    7
                                                                    avg = 10.7
                                                                    max =   11

                                                       Wald chi2(14)    =    .
Log likelihood = -1078.1341                            Prob > chi2      =    .

------------------------------------------------------------------------------
        ccit | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      cvcie1 |   8.12e-09   3.26e-09     2.49   0.013     1.73e-09    1.45e-08
          fs |  -.0315867   .0828908    -0.38   0.703    -.1940498    .1308763
     ihs_lev |  -1.036429   .1918913    -5.40   0.000    -1.412529   -.6603292
         rdi |  -2.303226   1.817969    -1.27   0.205     -5.86638    1.259928
      pcount |    .001566   .0002635     5.94   0.000     .0010496    .0020825
             |
       fyear |
       2010  |  -1.116491   .2103392    -5.31   0.000    -1.528748   -.7042336
       2011  |  -1.504335          .        .       .            .           .
       2012  |  -1.557152   .2233488    -6.97   0.000    -1.994908   -1.119397
       2013  |  -1.860322   .2126817    -8.75   0.000     -2.27717   -1.443473
       2014  |  -1.916959   .2182443    -8.78   0.000     -2.34471   -1.489208
       2015  |   -1.76991   .1889673    -9.37   0.000    -2.140279    -1.39954
       2016  |   -1.84402   .1852485    -9.95   0.000      -2.2071   -1.480939
       2017  |  -3.255445   .3514149    -9.26   0.000    -3.944205   -2.566684
       2018  |   -4.21537   .5266363    -8.00   0.000    -5.247558   -3.183182
       2019  |  -3.860314   .4762395    -8.11   0.000    -4.793726   -2.926902
             |
       _cons |   2.105214   .8609846     2.45   0.014     .4177149    3.792712
------------------------------------------------------------------------------
convergence not achieved
r(430);

end of do-file

Note:
I am getting this error only by adding "pcount" to my controls.
But it makes sense to test for it in my analysis.

I hope you can help me

Thank you

Kind regards,
Jana