BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Monday, May 31, 2021

Is it correct to interpret the sign of statistically non-signifcant coefficient?

Dear Members
This question turns to be somewhat foolish or trivial for many in this forum, but I think a clear cut answer if it all possible by members in this forum can uproot the doubt in my mind.

Question 1: Can we interpret the "SIGN" of an non-significant coefficient?
Asnwer1: "
"If a coefficient's t-statistic is not significant, don't interpret it at all. You can't be sure that the value of the corresponding parameter in the underlying regression model isn't really zero."
DeVeaux, Velleman, and Bock (2012), Stats: Data and Models, 3rd edition, Addison-Wesley
p. 801 (in Chapter 10: Multiple Regression, under the heading "What Can Go Wrong?")
Source:https://web.ma.utexas.edu/users/mks/...ioncoeffs.html

Answer2: Some authors interpret the sign at least, stating that the nature of the relationship is -ve (or +ve) but not statistically significant.
Which one of the above is more correct? Faced with an insignificant coefficient should we ignore them completely and move forward or should we stop and interpret the sign?
I have read a similar post on this forum but couldn't find the answer for my doubt

https://www.statalist.org/forums/for...t-coefficients

Bar graph of proportion over the years

I have two questions to clarify.

First question is how to show the proportion using graph bar, if the only commands available are mean median p1 p2 ... p99 sum count percent min max under the stat command. Do I use percent or count to show the proportion?

Next question is the years in the x-axis is clumped up for my graph, is there a solution to remedy this?

Array

skewed-Normal GARCH Estimation Using ml command.

Dear Statalist,

I'm trying to generate conditional skewness and kurtosis by using Gram-Charlier Expansion Series. Essentially, I'm trying to estimate the following system:
Array

where my likelihood function is :
Array
Array

Array

The following is my code:
//////////////////////////////////////////
clear

set more off

set fredkey 0c79529a0ad2485bee772c948e68374e, permanently
import fred DEXJPUS, daterange(1990-01-02 2002-05-03) aggregate(daily,eop) clear

drop if missing(DEXJPUS)

tsset daten
format daten %td

gen ret = ln(DEXJPUS)-ln(DEXJPUS[_n-1])
drop if _n==1

export excel using "jpnusfx9102daily", firstrow(variables) replace

sum ret, detail
global h0=r(Var)
global s0=r(skewness)
global k0=r(kurtosis)

* Own maximum likelihood program *
capture program drop garchsktry
program garchsktry
args lnf mu b0 b1 b2 c0 c1 c2 g0 g1 g2
tempvar et ht st kt nt psi gamma

qui gen double `et'=$ML_y1-`mu'

qui gen double `ht'=1
qui gen double `nt'=`et'/sqrt(`ht')
qui gen double `st'=`nt'^3
qui gen double `kt'=`nt'^4

//qui gen double `ht'=$h0
//qui gen double `st'=$s0
//qui gen double `kt'=$k0
//qui gen double `nt'=`et'/sqrt(`ht')

qui replace `ht'=`b0'+`b1'*`nt'[_n-1]^2 + `b2'*`ht'[_n-1] if _n>1
qui replace `st'=`c0'+`c1'*`nt'[_n-1]^3 + `c2'*`st'[_n-1] if _n>1
qui replace `kt'=`g0'+`g1'*`nt'[_n-1]^4 + `g2'*`kt'[_n-1] if _n>1

qui gen double `psi'= log((1 + (`st'*(`nt')^3 - 3*`nt')/6 + (`kt'*(((`nt')^4-6*(`nt')^2+3))/24))^2)
qui gen double `gamma'= log(1+ (`st'^2)/6 + (`kt'^2)/24)

qui replace `lnf'= -0.5*(log(`ht')+(`et'^2)/`ht') + `psi'-`gamma'

end

ml model lf garchsktry (mu: ret=) /lnf /b0 /b1 /b2 /c0 /c1 /c2 /g0 /g1 /g2

ml init /b0=0.0061 /b1=0.0309 /b2=0.9537 /c0=-0.0494 /c1=0.0018 /c2=0.3414 /g0=1.2365 /g1=0.0014 /g2=0.6464
ml search
ml max, difficult
ml report

///////////////////////////////////////////////////////////
this keeps giving me the following message:
"could not calculate numerical derivatives -- flat or discontinuous region encountered
r(430);"
I'm thinking that it might be because the starting values maybe wrong, I tried to use unconditional moments, still I keep getting the same error message, is there a way to tell Stata to not rely too much on the starting values, it shouldn't be an issue. or can anyone tell me what is wrong with my program, I really run out of good ideas.
and It has been a really long time ago since I'm stuck in here, and any help is really really appreciated.

Sarah

P.S.: I forgot to mention that I'm using Stata 17, if that makes any difference.

Split a string variable into character and numeric parts

Hi,
I have a string variable (profile) in my dataset which includes prefixes of H, W, B and the suffix numbers ranging from 1 to 1000 (H1, H2, H3, ..., H1000, W1, W2, W3, ..., W1000, etc). How can I split the prefix characters and the suffix numbers into two different parts. I have tried the below code which generates the suffix numbers but not the character part.
Thanks,
NM

gen iteration = regexs(0) if regexm(profile, "[0-9]*$")
destring iteration, replace

Change reference category of dependent variable

Hi,

I am running the following code

mlogit cq25 i.group [pweight = cq25pp], rrr

The dependent variable has three levels. It all works fine except Stata is choosing level 2 as the reference level for the dependent variable when I want level 1 to be the reference level.
I think Stata is doing this as level 1 has n = 80 and level 2 has n = 201 - a default for the larger n I guess.

I know independent variable reference levels can be changed with ib# but how can the dependent variable's reference level be chosen?

Don

How to identify oldest children with highest education?

Dear all,

I have a household data and I want to generate a variable indicating year of birth of a child with the highest educational level (condition 1) and in cases, if there are two children (or more) with the same educational level, choose the oldest one (condition 2).

For example, take a look at pid=183 (pid is id of parents) and this individual has three children. Here, I want to create a binary variable identifying the one born in 1953 (e.g., =1 and 0 or missing otherwise) because she is older though her educational level is the same as the one born in 1962.

Note: I think this question deserves a new thread so I created this one. My previous post can be found at: https://www.statalist.org/forums/for...r-parents-data

Code:

clear
input long qid byte(csex relationship) int cyob byte cedu
 11 2 4 1970 7
 13 1 3 1975 7
 16 1 3 1971 5
110 1 3 1959 7
111 1 3 1977 7
112 1 3 1977 6
123 1 3 1988 7
125 1 3 1982 7
129 2 4 1985 6
134 1 3 1975 5
136 1 3 1963 4
136 2 4 1967 4
137 1 3 1976 6
137 2 3 1980 7
137 1 3 1983 6
138 2 4 1969 5
139 1 3 1955 5
140 1 3 1978 4
141 1 3 1970 5
142 2 4 1959 4
146 1 3 1976 5
146 1 3 1978 5
147 1 3 1957 4
148 1 3 1982 5
151 1 3 1975 3
152 1 3 1986 3
152 1 3 1992 3
153 1 3 1955 4
154 1 3 1977 4
156 1 3 1957 4
159 1 3 1962 4
161 1 4 1968 5
161 1 3 1963 6
163 1 3 1956 4
164 1 3 1973 7
164 1 3 1979 7
167 2 4 1978 9
168 1 3 1973 7
168 2 4 1975 7
169 1 3 1959 4
171 1 3 1962 5
173 1 3 1970 5
175 1 3 1980 8
176 1 3 1979 8
178 1 3 1974 6
180 1 3 1990 5
181 1 3 1985 7
182 1 3 1957 4
182 2 4 1962 7
183 2 4 1953 4
183 2 4 1949 1
183 1 3 1962 4
186 1 3 1955 4
188 1 3 1964 4
190 1 3 1971 5
191 1 3 1984 7
192 1 3 1971 7
192 1 3 1977 5
193 1 3 1964 7
196 2 4 1956 4
197 1 3 1981 7
331 2 4 1993 4
332 1 3 1968 3
333 1 3 1973 3
336 1 3 1975 7
337 1 3 1965 3
338 2 4 1967 4
362 1 3 1963 5
363 2 4 1977 4
366 1 3 1960 5
369 1 3 1949 6
384 1 3 1975 7
387 1 3 1977 4
389 1 3 1975 7
463 1 3 1979 8
464 1 3 1973 7
465 1 3 1966 7
469 1 3 1981 7
491 2 4 1983 6
491 2 4 1991 5
493 2 4 1958 7
494 1 3 1982 7
496 1 3 1973 8
497 2 4 1983 6
end
label values csex LABEL_B25
label def LABEL_B25 1 "Male", modify
label def LABEL_B25 2 "Female", modify
label values relationship relationship
label def relationship 3 "Son", modify
label def relationship 4 "Daughter", modify
label values cedu cedu
label def cedu 1 "No schooling", modify
label def cedu 3 "Primary", modify
label def cedu 4 "Lower secondary", modify
label def cedu 5 " Upper secondary", modify
label def cedu 6 "Prof secondary education", modify
label def cedu 7 "Junior college/University", modify
label def cedu 8 "Master", modify
label def cedu 9 "PhD", modify

Thank you.

Identifying partial name overlap in string variables

Hi. I have two datasets with the variable “Product”, the name of different pharma drugs. I am trying to merge the two datasets by name. However, the names are not standardized as shown in the example below. Once merged, I want to identify observations that have parts of the name overlapping which, on merging, are in the “master only (1)” or the “using only (2)”. In the example below, this would identify observations 3 to 7. I don’t have much familiarity working with strings and would appreciate any guidance. Thanks.

Code:

input str50 Product byte _merge
"A&D" 1
"A/B OTIC" 1
"ALLERX" 2
"ALLERX (AM/PM DOSE PACK 30)" 1
"ALLERX (AM/PM DOSE PACK)" 1
"ALLERX DF" 2
"ALLERX PE" 1
"ABILIFY" 1
"ACARBOSE" 1

Graphing a multilevel model with interaction effect

Hi all!

I'm looking for suggestions on how to graph a multilevel model I'm working on.

I'm working with some covariates which I nest around one group: country. Therefore my model looks like this:

Code:

 melogit trust_d i.soldicontatti##c.individualism c.tightz i.health_d c.difficulties_o c.supportclosec i.vitasociale c.edu_c sex i.diffecon c.agepul c.agepul_q c.familymember || country:

I was looking for suggestions on how to possibly graph the interaction effect (i.soldicontatti##c.individualism) by country.
I've looked around, but what I have found doesn't help much.
I'm using Stata 16.

Thanks in advance and let me know if you need me to be more specific about my dataset.

Using the new table collect row percentages for multiple factor variables

Hi all,

I recently attended the webinar on customizable tables in Stata 17. One of the sample tables used in the webinar, as shown below in slighted modified, demonstrates the power of the new table/collect command.

Code:

-------------------------------------------------------
                           Hypertension                
                No             Yes            Total    
-------------------------------------------------------
Sex                                                    
  Male    2,611   43.7%   2,304   52.7%   4,915   47.5%
  Female  3,364   56.3%   2,072   47.3%   5,436   52.5%
Race                                                   
  White   5,317   89.0%   3,748   85.6%   9,065   87.6%
  Black     545    9.1%     541   12.4%   1,086   10.5%
  Other     113    1.9%      87    2.0%     200    1.9%
-------------------------------------------------------

Equally common is the need to show row percentages across the column variable, in this instance, the yes/no values of hypertension. Alternatively, in a slightly different arrangement, a table that combines the results from multiple tabstat results, showing total case count, count of cases with the yes value, and percent with yes, all properly formatted, would be very helpful as well. I searched without success any postings related this type of table using the new Stata 17 command table/collect. Shown below is the log result from tabstat using the same dataset in the webinar.

Code:

. webuse nhanes2, clear

. tabstat highbp, by(sex) s(count sum mean)

Summary for variables: highbp
Group variable: sex (Sex)

   sex |         N       Sum      Mean
-------+------------------------------
  Male |      4915      2304  .4687691
Female |      5436      2072  .3811626
-------+------------------------------
 Total |     10351      4376  .4227611
--------------------------------------

. tabstat highbp, by(race) s(count sum mean)

Summary for variables: highbp
Group variable: race (Race)

  race |         N       Sum      Mean
-------+------------------------------
 White |      9065      3748  .4134584
 Black |      1086       541  .4981584
 Other |       200        87      .435
-------+------------------------------
 Total |     10351      4376  .4227611
--------------------------------------

Any help will be greatly appreciated.

Best,
Ron

average log size in panel data

Hello,

I have a random effects model with clustered errors and I would like to add the log size to the model. When I generate the log and regress it remains insignificant. Is this due to that the size is an average for each fund so it does not change over time?

Dropping string observations starting with a number

Hi. I have the following data and want to delete all observations where the Product starts with a number. The Product variable is a string variable. I’d appreciate any help. Thanks.

Code:

"1 ANTI-INFECTIVES"       
"1 ANTI-INFECTIVES"       
"1 ANTI-INFECTIVES"       
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"105 MISCELLANEOUS AGENTS"
"115 NUTRITIONAL PRODUCTS"
"115 NUTRITIONAL PRODUCTS"
"122 RESPIRATORY AGENTS"  
"122 RESPIRATORY AGENTS"  
"122 RESPIRATORY AGENTS"  
"122 RESPIRATORY AGENTS"  
"20 ANTINEOPLASTICS"      
"20 ANTINEOPLASTICS" 
"ACCOLATE"
"ABILITY"
"ACCU-CHECK"

I got different result from xtreg and eventdd

Hello,
I have unbalanced panel data, the DV is shortage, and the Independent variable civica_status that takes 0 before the event and 1 after the event happened. Not all the observations got treated.
I used the command

Code:

xi: xtreg shortage civica_status  i.date lead4 lead3 lead2 lead1 lag1-lag3, fe vce(cluster NDCid)

and

Code:

eventdd shortage civica_status i.date, fe timevar(dif) ci(rcap) level(95) cluster(NDCid) accum lags(3) leads(4) graph_op(ytitle("Civica"))

but I got different results from each of them! Shouldn't the result be the same?

Thanks,

Using omega squared after regression

Dear Forum,
Can I use omega squared after regression to determine effect size if my independent variables of interest are quantitative?
Thank you!

Regression insignificant after adding industry fixed effects

Hello,

I am new to this forum and i am currently writing a MSc thesis for finance.
I am researching the effect of CEO overconfidence on the effect of seasoned equity announcements (SEO) and short-term stock returns.

My regression model looks like this.

Dependent variable: Cumulative abnormal returns (CAR) post- SEO announcement
Independent variable: Overconfidence dummy
control variables; size, book-to-market, leverage, return on assets, issue size, underpricing, ceo age, ceo gender (all continuous, except for gender) (size is expressed in logarithm)

I have computed the CARs for firms having announced SEOs during 2010-2020, therefore my data is cross sectional.
My thesis supervisor suggested i maybe could add firm/industry and/or year fixed effects.
Therefore, i have the following regression command: reg CAR Overconfidence MktCap BtM Lev Roa IssSize UndPr Age Male i.Industry i.Year, vce(robust)

However, this command returns very small coefficients (max. 0.05) and all of the variables including the constant are statistically insignificant, overall regression significance p>F returns a dot.
I have tried experimenting with the regression, removing year fixed effects and/or industry fixed effects and removing the robust standard errors.

The most significance i get is when i add year fixed effects and robust s.e., 3/8 variables become significant (Overconifdence, IssSize, UndPr), the overall regression significance p>F becomes 0.000.

How could this be explained or interpreted.

Looking forward to your answers

Kind regards,
Darya

How to convert date variable to year group?

I am given a list of date variables in dd/mm/yy and I would like to sort the variables out in the form of grouping them together by the specific year. For example, I want all date variables of 2010 to be in 2010 year variable and all date variables of 2011 to be in 2011 year variable.

How do I go about doing this?

Need help with constraint dropped when testing for equality coefficients and joint significance of two variables

Hi all,

Currently, I'm doing a multiple regression, with the totalcases as the dependent variable, while other 6 variables (border, PM, health, popu, totalphy, GDP) are independent. GDP, totalphy (total physicians in a country), health (health expenditure of a country) are calculated using population * per capita or per 1000 indicators (GDP per capita, health expenditure per capita, physicians per 1000) by generating new variable.

The problem is that when I tried to test the equality of coefficients of health and GDP, the result is constraint 1 dropped and there was no F value nor Prob > F. I also test joint significance for 15 pairs of variables and many pairs suffered from constraint dropped.

Can anyone provide me with a detailed explanation and solution? I appreciate every answer given to me.
Array

Reshaping and keeping labels

Hello!
As context, I'm estimating several regressions with around 900 parameters. I need to save around 400 of them (the ones that are an interaction, so I can then merge them back into the data set and do a scatter plot). I'm using statsby to save the parameters to a dataset, however, statsby gives us something like:

. des _stat_619

Variable Storage Display Value
name type format label Variable label
----------------------------------------------------------------------------------------------------------------------------
_stat_619 float %9.0g _b[4672.cae4#c.educ]

When reshaping, the 619 means nothing to me. I actually only care about the 4672 in the label.

Practically, my problem can be explaining in the following manner:

Code:

clear
input id x2007 x2008 x2009
1 12 16 18
end

foreach v of varlist x* {
label variable `v' "`=substr("`v'",1,1)' factor(`=substr("`v'",length("`v'")-3,4)')"
}

desc *

//This is what I'm getting (see list):
reshape long x, i(id) j(year)
list

//This is what I want (see list):
label define yearlbl 2007 "x factor(2007)" 2008 "x factor(2008)" 2009 "x factor(2009)" , replace
label values year yearlbl
list

Thanks for you time!
Hélder

Help needed Pearsons R table for Categorical Independent variables

Hi All,

I am trying to create a Pearson's correlation table in Stata.
The experiment looks at Independent variable (respondents being randomised between 3 manipulations) = none, negative none-verbal cues, and positive none-verbal cues.
Dependent variable looks into investing decision and attitudes.

Atm I have my stata sheet cleaned (with all the way on the left column with string variables Cues = with under neath all the negative, positive, none responses) = see foto

Array

Now i want to compare quite a few variables within the Pearsons's table. How can I do this though given the categorical IV situation?

Line graph (mean) with panel data

Hey Stata community,

I tried to plot something using a panel data set and the "line" command (sales over time from 20 different companies).

Unfortunately, Stata shows me for each company separate lines. I just want to have 1 line over time (the mean of sales from all companies).

To demonstrate, I added a graph.

How can I see the mean line instead of 20 lines in the plot? Is there a mean-command that works?

Many thanks and best regards!
Array

Non-parametric correlation test between dichotomous variable and continuous one

Dear Statalist.

I have a small sample of countries. I have two variables, X which is a continuous variable, and Y which is a dichotomous one. The problem is that my variable X is not normally distributed, so I cannot use point-biserial test, then I am looking for a non-parametric test. Reading the literature I have read about "Eta correlation test". However, I do not Know how to compute it in Stata. Any advise?

Thanks in advanced,
Ibai

How to create a variable from values of other members of a group?

Dear all,

I am analysing an unbalanced panel dataset that is similar to the following example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(family id) str1 partnerid int(year inc)
1 1 "2" 2000 1250
1 1 "2" 2001 1250
1 1 "2" 2002 1300
1 1 "2" 2003 1300
1 1 "2" 2004 1380
1 1 "2" 2005 1400
1 2 "1" 2000 2000
1 2 "1" 2001 2120
1 2 "1" 2002 2120
1 2 "1" 2003 2120
1 2 "1" 2004 2250
1 2 "1" 2005 2250
2 3 "4" 2000 1300
2 3 "4" 2001    0
2 4 "3" 2000 1500
2 4 "3" 2001 1600
2 4 "." 2002 1600
2 4 "." 2003 1800
2 4 "5" 2004 1800
2 5 "4" 2004 1400
end

I want to create a variable partner income (partnerinc) that shows the income of the partner in the corresponding year, if a partner was in the family of course.
The solution should look like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(family id) str1 partnerid int(year inc) str4 partnerinc
1 1 "2" 2000 1250 "2000"
1 1 "2" 2001 1250 "2120"
1 1 "2" 2002 1300 "2120"
1 1 "2" 2003 1300 "2120"
1 1 "2" 2004 1380 "2250"
1 1 "2" 2005 1400 "2250"
1 2 "1" 2000 2000 "1250"
1 2 "1" 2001 2120 "1250"
1 2 "1" 2002 2120 "1300"
1 2 "1" 2003 2120 "1300"
1 2 "1" 2004 2250 "1380"
1 2 "1" 2005 2250 "1400"
2 3 "4" 2000 1300 "1500"
2 3 "4" 2001    0 "1600"
2 4 "3" 2000 1500 "1300"
2 4 "3" 2001 1600 "0"   
2 4 "." 2002 1600 "."   
2 4 "." 2003 1800 "."   
2 4 "5" 2004 1800 "1400"
2 5 "4" 2004 1400 "1800"
end

I did it manually for these very few observation, but how do I create such a variable in general? I was looking through other posts on the forum but could not find a related one or get the hang of it when it was related.
I am using Stata 16.0. on Windows.

Thank you very much for your help in advance.

Best regards,
Marvin

predicted probability meaning

Hi All,

I appologies for asking this very basic question. Part of my PhD research, I have calculated the predicted probabilities by calling mtable command in STATA and presented their results by following the interepretation in Regression Models for Categorical Dependent Variables Using Stata-Stata Press (2014) by J. Scott Long and Jeremy Freese. Now, I am asked to define the predicted probability by my superviser. However, I tried myself to find answer for this question but ending up with nothing. So, kindly help me to find the meaning of predicted probability. It would be really greatful for me. Thanks in advance.

Regards
Karthick Veerapandian.

Kakwani index Standard errors

Hi,

I want to know how to obtain Kakwani index Standard errors, where Kakwani index is the concentration of health care payments minus the Gini coefficient.

Thanks,
Aarushi

Reshaping from wide to long and keeping the label

Hello everyone,

I want to reshape my dataset from wide to long format and keep the label information as a new variable.

This is what my data looks like in long format:

Code:

CompanyName                                      _v1                    _v2                   _v3                  _v4
"GSW Immobilien AG"                                          0       5894403840 -1.88679245283019 5910893287.23354
"Deutsche Rohstoff AG"                        4.28571428571428       74193506.2  9.80392156862745         71144458
"Draegerwerk AG & Co KGaA"                   -.134408602150549 1392782583.91451 -.626666666666659 1407410211.36748
"Tonkens Agrar AG"                            .442477876106205 7521354.29451648                 0 7519657.36964375
"BHB Brauholding Bayern Mitte AG"            -13.9240506329114          8432000  16.1764705882353          9796000
"Deutsche Konsum REIT AG"                    -.324675324675329 538890934.198159 -.323624595469249 542916002.194247
"BRAIN Biotech AG"                            2.40700218818379      185902329.6 -1.72043010752688      181532830.4
"Senvion SA"                                  10.4972375690608       14623376.6  .555555555555556     13234155.823
"Lion E Mobility AG"                         -6.63265306122449      36719436.78 -3.20987654320987      39327921.36
"va Q tec AG"                                -6.35179153094462      376323182.5 -.486223662884934      401847711.4
"Bitcoin Group SE"                           -8.73180873180874        2.195e+08 -1.83673469387755        2.405e+08
"Shop Apotheke Europe NV"                    -.846354166666656     2731518928.3 -4.77371357718538     2754834585.6
"Medios AG"                                  -2.77777777777778        709274685                 0        729539676
end

_v1 is return of day one (the label of the variable is the date, here it is 5/11/2021), _v2 is the market cap of the same day (again, the label is 5/11/2021).

I reshaped the data using the following command:

Code:

gen long obs_no = _n
reshape long _v, i(obs_no) j(_j)
gen which_var = "return" if mod(_j, 2) == 1, before(_v)
replace which_var = "market_cap" if missing(which_var)
gen firm_num = ceil(_j/2)
drop  _j
reshape wide _v, i(firm_num obs_no) j(which_var) string
rename _v* *
drop obs_no

unfortunately, I was not able yet to create a new variable with the date.
This is what my data looks like after reshaping it:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
firm_num  market_cap      return        CompanyName
1       5894403840                 0 "GSW Immobilien AG"                         
1       74193506.2  4.28571428571428 "Deutsche Rohstoff AG"                      
1 1392782583.91451 -.134408602150549 "Draegerwerk AG & Co KGaA"                  
1 7521354.29451648  .442477876106205 "Tonkens Agrar AG"                          
1          8432000 -13.9240506329114 "BHB Brauholding Bayern Mitte AG"           
1 3931445305.56908 -2.66106442577032 "Stroeer SE & Co KGaA"                                                        
1 11003685830.4026 -1.27868852459017 "Uniper SE"                                 
1      36719436.78 -6.63265306122449 "Lion E Mobility AG"                        
1      376323182.5 -6.35179153094462 "va Q tec AG"                               
1        2.195e+08 -8.73180873180874 "Bitcoin Group SE"                          
1     2731518928.3 -.846354166666656 "Shop Apotheke Europe NV"                   
1        709274685 -2.77777777777778 "Medios AG"                                 
end

What I would like is a new variable with the label information. Hence, for firm_num=1 it would be 5/11/2021, so the same date as for variable _v1 and _v2.

Sunday, May 30, 2021

ITSA command - stationarity

Hello all,

I am using the -itsa- command for a time series regression with covariates. My time series are a mixture of I(0) and I(1), i.e. some are stationary in levels forms and others are stationary in first differences. I have not been able to see from anywhere that how should I include both types of time series in the -itsa- command. The Stata Journal article of this command does not explain how to deal with such a situation either.
So should I write only the original variables and the -itsa- command will automatically do calculations based on at which levels the time series are stationary? Or should I include a mixed model with some variables in levels forms and others in first differences?

Any help would be appreciated.

Regards,
Mujahid

Time invariant variables in hybrid model

Dear all,

I am studying on the determinants of labour productivity and I am using the hybrid model.
First of all, I chose growth rate of log_LP as dependent variables (continuous) and growth rate of log_LP of frontier firms, lagged gap between the LP of frontier and laggard firms, firm-level control variables (age, size, intangible assets, digital training, digital adoption, STRI indicators) as explanatory variables.
They are all time-variant variables except for digital training (Broadband, ERP, CRM, Cloudcomputing), digital training (ICT training, ICT specialists etc) and STRI indicators i.e. time invariant variables & sectoral-level variables.
For those variables, since they are not covering all years in the dataset, I have replaced the missing values by their sectoral-level averaged values. Therefore, the values vary across sectors, not over years.
And when I apply this model in Stata, it says that the variables below are omitted.
I surely think that there are many mistakes on my model as well as codes, however I could not really understand the problem..
Could you please tell me the problems here?

Code:

note: d_Broadband omitted because of collinearity
note: d_ERPusing omitted because of collinearity
note: d_CRMusing omitted because of collinearity
note: d_Cloudcomputing omitted because of collinearity
note: d_ICTspecialist omitted because of collinearity
note: d_ICTtraining omitted because of collinearity
note: d_training_ICTspecialist omitted because of collinearity
note: d_regulatory_transparency omitted because of collinearity
note: m_CRMusing omitted because of collinearity
note: m_Cloudcomputing omitted because of collinearity
note: m_ICTspecialist omitted because of collinearity
note: m_ICTtraining omitted because of collinearity
note: m_training_ICTspecialist omitted because of collinearity

Random-effects GLS regression Number of obs = 340
Group variable: id Number of groups = 192

R-sq: Obs per group:
within = 0.3428 min = 1
between = 0.2131 avg = 1.8
overall = 0.2208 max = 6

Wald chi2(20) = .
corr(u_i, X) = 0 (assumed) Prob > chi2 = .

--------------------------------------------------------------------------------
dlog_lp_VA | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
Frontier_gro~h | .1618926 .0592357 2.73 0.006 .0457927 .2779926
Lagged_gap | .3425023 .0419372 8.17 0.000 .260307 .4246976
d_g_intangible | -.4667624 .3021783 -1.54 0.122 -1.059021 .1254962
d_share_inta~l | .0111534 .0900579 0.12 0.901 -.1653567 .1876636
d_age | -.0009252 .002784 -0.33 0.740 -.0063816 .0045313
d_Broadband | 0 (omitted)
d_ERPusing | 0 (omitted)
d_CRMusing | 0 (omitted)
d_Cloudcompu~g | 0 (omitted)
d_ICTspecial~t | 0 (omitted)
d_ICTtraining | 0 (omitted)
d_training_I~t | 0 (omitted)
d_ind_STRI | -189.4641 733.7918 -0.26 0.796 -1627.67 1248.741
d_res_for | 196.2257 736.4029 0.27 0.790 -1247.097 1639.549
d_res_monvem~t | 203.6151 775.5217 0.26 0.793 -1316.379 1723.61
d_discrimina~e | 141.2516 550.3346 0.26 0.797 -937.3844 1219.888
d_barriers_c~n | 149.8324 604.2113 0.25 0.804 -1034.4 1334.065
d_regulatory~y | 0 (omitted)
m_g_intangible | 3.318489 1.839872 1.80 0.071 -.287593 6.924571
m_share_inta~l | -1.182859 .6844581 -1.73 0.084 -2.524372 .1586545
m_age | -.162613 .0905383 -1.80 0.072 -.3400648 .0148387
m_Broadband | -4.683323 4.759522 -0.98 0.325 -14.01181 4.645168
m_ERPusing | -.1093101 .1632869 -0.67 0.503 -.4293465 .2107263
m_CRMusing | 0 (omitted)
m_Cloudcompu~g | 0 (omitted)
m_ICTspecial~t | 0 (omitted)
m_ICTtraining | 0 (omitted)
m_training_I~t | 0 (omitted)
m_ind_STRI | 142.9161 138.9487 1.03 0.304 -129.4183 415.2505
m_res_for | 0 (omitted)
m_res_monvem~t | -261.7917 266.8595 -0.98 0.327 -784.8267 261.2434
m_discrimina~e | -141.1574 215.5238 -0.65 0.512 -563.5762 281.2614
m_barriers_c~n | -371.0285 383.4101 -0.97 0.333 -1122.498 380.4414
m_regulatory~y | -579.394 576.4592 -1.01 0.315 -1709.233 550.4452
_cons | 479.3276 488.1428 0.98 0.326 -477.4147 1436.07
---------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .45474182
rho | 0 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Thanks,
Anne-Claire

Renaming every 3 variable

Hi,
I am trying to rename every three variable in my dataset such that the first three variable X1 X2 X3 read HH-W-X1 HH-B-X2 HH-H-X3. Also, I would like to rename the three subsequent variable as follows: X4 X5 X6 —> UH-X4 UX-x5 UX-x6. My data includes 42 variables as X1 X2 ... X42 and I would like to loop this over 42 variables. Thanks for your help.
Best,
NM

Creating an IF or Conditional command

Hi All,

It is my first time in this forum and I'm hoping you will be able to assist me.

My study is on Adolescence Obesity and I have huge data that consists of everyone including adults. I would like to generate a variable called Mother_IBM. So the condition is that the output should be female (numlabel 2) and have No_of_kids>0

Thanks

Lee bounds with multiple treatment group

Hi,

I'm working with a dataset with high attrition rate and am considering using Lee bounds to estimate the treatment effect. Below is the code I am using to determine the share of respondents I need to trim above/below to compute the Lee bounded treatment effect for a binary treatment variable.

Code:

quietly count if intervention==0 & time==0
local tot_control=`r(N)'
quietly count if intervention==1 & time==0
local tot_treatment=`r(N)'
quietly count if intervention==0 & time==1 & consent2==1
local found_control=`r(N)'
quietly count if intervention==1 & time==1 & consent2==1
local found_treatment=`r(N)'
local q_control=`found_control'/`tot_control'
local q_treatment=`found_treatment'/`tot_treatment'
  
if `q_treatment'>`q_control' {
local q1=(`q_treatment'-`q_control')/`q_treatment'
}
if `q_treatment'<`q_control' {
    local q1=(`q_control'-`q_treatment')/`q_control'
}

I was wondering how I would proceed if I have three treatment groups (Treatment 1, Treatment 2 and Control).

Thanks

Merge Panel Data with multiple same year dataset

Hi All,

I am trying to merge two datasets. One is a firm ID -year (panel master data). The other dataset (hereafter ECHO dataset) is from the US Government's information of penalties charged to a firm due to environmental violations. So, it can happen that a firm has years when there were no violation but on the other hand, it can also happen that a firm has two or more violations in one year.

My objective is to merge some variables like "total yearly sanction" (Variable 9 below) into my master data.

I have generated some variables on the ECHO dataset that I want to merge with my panel master data. In particular, my question is to understand how I could merge the panel data with variable 9.

Code:

7) sum of fed penalty, compliance cost and SEP cost

 gen tot_sanction = fed_pen + sep_cost + tot_compl_amt
 
 
8) total sanction per year 

by f_id settle_year: gen tot_sanction_y = sum(tot_sanction)


9) variable storing total sanction per year/

by f_id settle_year: keep if _n =_N

Here is a copy of the ECHO data set that I am using:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int f_id str10 settle_date long settle_year double(fed_pen sep_cost tot_compl_amt) float(tot_sanction tot_sanction_y)
 2 "07/28/1978"  4       .      .        .        .        0
 2 "01/29/1987" 13    8500      0        0     8500     8500
 2 "04/07/1988" 14       0      0        0        0        0
 2 "09/19/1989" 15       0      0        0        0        0
 2 "10/18/1990" 16  112500      0        0   112500   112500
 2 "09/25/1990" 16       .      .        .        .   112500
 2 "08/30/1990" 16       0      0        0        0   112500
 2 "08/26/1991" 17    4920      0        0     4920     4920
 2 "02/15/1991" 17    7267      0        0     7267    12187
 2 "05/06/1992" 18       0      0        0        0        0
 2 "09/24/1992" 18   25000      0        0    25000    25000
 2 "09/24/1992" 18   25000      0        0    25000    50000
 2 "06/09/1992" 18   25000      0        0    25000    75000
 2 "01/29/1992" 18       0      0        0        0    75000
 2 "08/07/1992" 18    9750      0        0     9750    84750
 2 "12/07/1993" 19   18381      0        0    18381    18381
 2 "06/30/1993" 19       0      0        0        0    18381
 2 "12/07/1993" 19   18381      0        0    18381    36762
 2 "06/30/1993" 19       0      0        0        0    36762
 2 "12/07/1993" 19   18381      0        0    18381    55143
 2 "06/30/1993" 19       0      0        0        0    55143
 2 "02/25/1993" 19       .      .        .        .    55143
 2 "10/29/1993" 19   30000      0        0    30000    85143
 2 "05/04/1994" 20       0      0        0        0        0
 2 "10/19/1995" 21       0      0        0        0        0
 2 "02/14/1995" 21    4200      0        0     4200     4200
 2 "05/12/1995" 21   50000      0        0    50000    54200
 2 "01/04/1995" 21    4950      0        0     4950    59150
 2 "06/12/1995" 21       0      0        0        0    59150
 2 "10/03/1996" 22       .      .        .        .        0
 2 "04/18/1996" 22       0      0 70000000 7.00e+07 7.00e+07
 2 "10/03/1996" 22   84000      0        0    84000 70084000
 2 "12/02/1997" 23   17510      0        0    17510    17510
 2 "06/18/1997" 23   22500      0        0    22500    40010
 2 "09/30/1997" 23       0      0        0        0    40010
 2 "07/07/1997" 23  213000      0        0   213000   253010
 2 "09/29/1997" 23  238000      0        0   238000   491010
 2 "02/20/1997" 23   75515      0        0    75515   566525
 2 "09/29/1997" 23  238000      0        0   238000   804525
 2 "07/07/1997" 23  213000      0        0   213000  1017525
 2 "12/01/1997" 23       0      0        0        0  1017525
 2 "10/02/1997" 23       0      0  1590000  1590000  2607525
 2 "05/17/1998" 24       0      0        0        0        0
 2 "06/30/1998" 24   46450      0        0    46450    46450
 2 "11/18/1998" 24       0      0  1000000  1000000  1046450
 2 "06/30/1998" 24   46450      0        0    46450  1092900
 2 "06/30/1998" 24   46450      0        0    46450  1139350
 2 "07/22/1998" 24       0      0  2000000  2000000  3139350
 2 "06/30/1998" 24   46450      0        0    46450  3185800
 2 "02/04/1999" 25    1000      0        0     1000     1000
 2 "06/14/1999" 25       0      0        0        0     1000
 2 "03/24/1999" 25       0      0        0        0     1000
 2 "01/15/1999" 25  143800      0        0   143800   144800
 2 "09/22/2000" 26   38596      0        0    38596    38596
 2 "09/22/2000" 26    7150      0        0     7150    45746
 2 "09/22/2000" 26   52340      0        0    52340    98086
 2 "12/20/2000" 26    5000      0        0     5000   103086
 2 "04/17/2000" 26       0      0  5000000  5000000  5103086
 2 "09/27/2001" 27       0      0   850000   850000   850000
 2 "07/05/2002" 28       0      0   300000   300000   300000
 2 "09/26/2002" 28    2810      0        0     2810   302810
 2 "06/24/2002" 28       0      0 43000000 4.30e+07 43302808
 2 "08/01/2002" 28       0      0        0        0 43302808
 2 "09/30/2003" 29       0      0    10000    10000    10000
 2 "10/06/2003" 29       0      0   800000   800000   810000
 2 "11/14/2003" 29       0      0        0        0   810000
 2 "09/30/2003" 29       0      0        0        0   810000
 2 "02/20/2003" 29   16170  62225     5000    83395   893395
 2 "08/09/2004" 30       0      0    10000    10000    10000
 2 "09/30/2004" 30       0      0  7500000  7500000  7510000
 2 "09/28/2004" 30   27500 165000    10000   202500  7712500
 2 "05/04/2004" 30       0      0   963000   963000  8675500
 2 "05/26/2004" 30       0      0        0        0  8675500
 2 "06/29/2005" 31       0      0 24500000 2.45e+07 2.45e+07
 2 "08/31/2005" 31       0      0        0        0 2.45e+07
 2 "01/14/2005" 31       0      0        0        0 2.45e+07
 2 "04/25/2005" 31       1      0        0        1 2.45e+07
 2 "09/29/2005" 31       0      0        0        0 2.45e+07
 2 "09/07/2006" 32    3107  11673      100    14880    14880
 2 "05/01/2006" 32 1521983      0        0  1521983  1536863
 2 "04/12/2007" 33       0      0      100      100      100
 2 "04/12/2007" 33       0      0      100      100      200
 2 "02/21/2008" 34       0      0        0        0        0
 2 "08/07/2008" 34       0      0  9382412  9382412  9382412
 2 "05/07/2008" 34       0      0 18000000 1.80e+07 27382412
 2 "06/02/2008" 34   30000      0        0    30000 27412412
 2 "09/24/2008" 34       0      0        0        0 27412412
 2 "08/21/2008" 34       0      0        0        0 27412412
 2 "10/27/2009" 35       0      0 27000000 2.70e+07 2.70e+07
 2 "04/14/2009" 35       0      0 59000000 5.90e+07 8.60e+07
 2 "12/24/2009" 35    1310   4913      100     6323 86006320
 2 "09/01/2009" 35       0      0      100      100 86006424
 2 "11/09/2010" 36       0      0 29980000 29980000 29980000
 2 "10/07/2010" 36       0      0  5600000  5600000 35580000
 2 "12/06/2011" 37       0      0  1800000  1800000  1800000
 2 "02/09/2012" 38       0      0  7300000  7300000  7300000
 2 "02/26/2013" 39       0      0 16390000 16390000 16390000
 2 "09/25/2014" 40   65000      0    32400    97400    97400
 2 "03/17/2014" 40       0      0  7830000  7830000  7927400
 2 "05/05/2014" 40       0      0  2000000  2000000  9927400
 2 "04/21/2014" 40       0      0  3400000  3400000 13327400
 2 "11/22/2016" 42   40000      0  5500000  5540000  5540000
 2 "09/26/2017" 43       0      0     7000     7000     7000
 2 "01/10/2017" 43       0      0        0        0     7000
 2 "04/10/2017" 43    8251      0        0     8251    15251
 2 "09/29/2017" 43   50000      0        0    50000    65251
 2 "09/30/2019" 45       0      0 11000000 1.10e+07 1.10e+07
 2 "02/05/2020" 46   74360      0     6000    80360    80360
 2 ""            .       .      .        .        .        0
 2 ""            .       .      .        .        .        0
 2 ""            .    4550      0        0     4550     4550
 2 ""            .       .      .        .        .     4550
 3 "10/09/1985" 11 4000000      0        0  4000000  4000000
 3 "09/30/1990" 16       1      0        0        1        1
 3 "07/16/1990" 16       .      .        .        .        1
 3 "12/20/1991" 17       0      0        0        0        0
 3 "09/28/1992" 18       0      0        0        0        0
 3 "08/30/1993" 19       0      0        0        0        0
 3 "12/19/1994" 20   28000      0        0    28000    28000
 3 "10/31/1994" 20   28000      0        0    28000    56000
 3 "09/29/1995" 21       0      0        0        0        0
 3 "10/02/1995" 21  182654      0        0   182654   182654
 3 "06/12/1995" 21       0      0        0        0   182654
 3 "04/18/1996" 22       0      0        0        0        0
 3 "09/03/1997" 23    7200      0        0     7200     7200
 3 "03/31/1998" 24       0      0 10000005 10000005 10000005
 3 "06/08/1999" 25       0      0        0        0        0
 3 "09/29/2000" 26    5000      0        0     5000     5000
 3 "06/28/2002" 28       0      0   100000   100000   100000
 3 "11/01/2002" 28       0      0    35000    35000   135000
 3 "10/17/2002" 28     500      0    18750    19250   154250
 3 "08/12/2002" 28    3500      0        0     3500   157750
 3 "03/31/2003" 29       0      0        0        0        0
 3 "04/10/2007" 33       0      0        0        0        0
 3 "09/18/2008" 34   25033      0        0    25033    25033
 3 "09/30/2009" 35       0      0        0        0        0
 3 "02/09/2012" 38       0      0  7300000  7300000  7300000
 3 "05/16/2014" 40       0      0     1500     1500     1500
 3 ""            .       .      .        .        .        0
 4 "08/12/2004" 30       0      0   430000   430000   430000
 4 "08/16/2004" 30   17903 144692        0   162595   592595
 4 "03/30/2006" 32   57372 418300   714000  1189672  1189672
 4 "09/30/2011" 37       0      0        0        0        0
 4 "12/14/2011" 37   10155 125601        0   135756   135756
 4 "06/03/2013" 39       0      0  1500000  1500000  1500000
 4 "09/08/2014" 40     825      0     2000     2825     2825
 5 "08/03/2009" 35   30000      0        0    30000    30000
 5 "08/03/2009" 35   30000      0        0    30000    60000
 5 "01/11/2010" 36   30000      0   110000   140000   140000
 6 "09/21/2009" 35       0      0        0        0        0
 7 "03/02/1994" 20       0      0        0        0        0
 7 "08/25/1998" 24     400      0        0      400      400
 7 "05/15/1998" 24       0      0        0        0      400
 7 "06/25/1999" 25       0      0        0        0        0
 7 "04/13/2000" 26       0      0        0        0        0
 7 "11/02/2001" 27       0      0        0        0        0
 7 "09/28/2001" 27    2640      0        0     2640     2640
 7 "09/30/2003" 29       0      0        0        0        0
 7 "02/28/2003" 29    5500      0        0     5500     5500
 7 "10/14/2003" 29       0      0        0        0     5500
 7 "09/30/2004" 30       0      0  7500000  7500000  7500000
 7 "04/13/2004" 30   11880      0        0    11880  7511880
 7 "01/23/2006" 32       0      0       15       15       15
 7 "07/07/2006" 32       0      0        0        0       15
 7 "01/23/2006" 32       0      0       15       15       30
 7 "11/09/2006" 32    7000      0        0     7000     7030
 7 "08/22/2006" 32       0      0        0        0     7030
 7 "01/23/2006" 32       0      0       15       15     7045
 7 "02/27/2007" 33    3000      0      100     3100     3100
 7 "05/08/2007" 33       0      0 37000000 3.70e+07 37003100
 7 "07/02/2007" 33       0      0  1400000  1400000 38403100
 7 "05/07/2008" 34       0      0 18000000 1.80e+07 1.80e+07
 7 "01/30/2009" 35     800      0      100      900      900
 7 "05/04/2010" 36    2790      0      100     2890     2890
 7 "06/27/2011" 37       0      0        0        0        0
 7 "07/15/2015" 41    4080      0        0     4080     4080
 7 "08/21/2015" 41       0      0        0        0     4080
 7 "11/10/2016" 42       0      0        0        0        0
 7 "11/10/2016" 42       0      0        0        0        0
 7 "10/28/2016" 42       0      0        0        0        0
 7 "06/14/2017" 43    1785      0     2500     4285     4285
 7 "05/25/2017" 43    1785      0     2500     4285     8570
 7 "04/12/2018" 44     800      0        0      800      800
 7 "04/11/2019" 45       0      0        0        0        0
 8 "09/28/2015" 41   30800      0     3000    33800    33800
10 "06/13/1985" 11   50000      0        0    50000    50000
11 "03/19/1999" 25       0      0        0        0        0
12 "07/24/2006" 32  142500      0   500000   642500   642500
14 "10/28/1991" 17   15000      0        0    15000    15000
14 "10/04/2004" 30       0      0  1000000  1000000  1000000
15 "07/25/1994" 20       .      .        .        .        0
15 "09/25/2002" 28       0      0        0        0        0
15 "09/24/2002" 28     750      0      100      850      850
15 "08/05/2004" 30   20619      0    51513    72132    72132
15 "06/22/2010" 36    3600      0     5000     8600     8600
16 "09/24/2004" 30       0      0        0        0        0
16 "10/17/2018" 44       0      0        0        0        0
16 "01/18/2018" 44       0      0        0        0        0
20 "04/15/1992" 18   45000      0        0    45000    45000
20 "11/15/1995" 21       0      0        0        0        0
end

Note: I am not sure why long settle year is come as a 2 digit output because in stata it is showing as years like 1980, 1981...
Thank you.
Deb

Interpretation of a linear (percentage) - log regression model

To provide context, I am running a fixed effects regression model, assessing the relationship between the percent an organization spends on overhead to how much it produces in terms of the number of houses built, as well as how much revenue it generates.

My independent variable is the overhead ratio, which is between 0 and 1. My dependent variable is the log of the total number of houses. The coefficient is .88 and significant. So, I take the exponent of .88, which I believe is 2.41, subtract 1 and multiply times 100, which gives me 141%. For the interpretation, I'm saying that a one-percentage-point increase in the overhead ratio equates to a 141% increase in the total number of houses built. However, this seems way too high.

I also use a second dependent variable - the log of total revenue. For this, I get a significant coefficient of .32, which again, I take the exponent of .32 and get 1.38. I then subtract 1 and multiply times 100, which gives me 37.17%. I interpret this as a one-percentage-point increase in the overhead ratio equates to a 37.17% increase in total houses built. Again, this seems way too high to me.

Am I doing the calculations and interpretations correctly? Many thanks in advance!

Problem with Survey data analysis, non-response, selection bias, use of paradata

Hi Statalisters, Greetings from Inddi

I am analysing a survey data and I am relatively new to this commonly used study design. I am seeking help on topics of survey weighting, selection bias and paradata.

Survey data setup

The survey was on doctors (registered under a particular program in the country) on the impact of the pandemic on health services. The sampling frame consists of 23,900 doctors covered by 3 agencies (Agency A, B and C). Under A, there are 13,400 doctors, Under B there are 6000 doctors and under C, 4500 doctors. Among these, 700, 1000 and 1100 doctors were randomly sampled from Agency A, B and C respectively (total sample= 2800). From the survey conducted on these 2800 doctors across the 3 agencies, response was received from 400 doctors from Agency A, 800 from agency B and 700 from agency C.
As per the above, I have assumed that this survey used a stratified random sampling at the agency level. The dataset I have (Data respondents) is on these 1900 doctors. Data is available on about 200 variables from the 1900 responders.

Data available on non-respondents and paradata

The central concern is non-responders and how to account for the ensuing bias as described below. The challenge I am facing with non-response analysis is that the data I have on the 900 non-responders are minimal. In the data set with the full 2800 doctors (Data full) the data I have common across responders and non-responders are only on their (1) agency (A, B or C), (2) qualification (3 category variable bachelors, specialization, super specialization) , and (3) province (5 category variable). Additionally, I also have paradata on the ‘number of attempts’ to contact the doctors (Attempt 1, 2 and 3) – Var 4. Reason for nonresponse among the 900 doctors is also recorded (reasons fall under 10 categories)

Analysis will involve estimating frequencies and proportions, and few regression models giving crude Odds ratio estimates. What is the practical way to analyse this survey data accounting for selection bias?

I give below a sample data set with 30 observations and few variables produced by -dataex-..The data structure below is that of respondents only.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int dateofsurvey str1 agency str2 province str19 qualification byte numberofattemptstocontact str23 age byte opdload_ct str14 opdload_hilo str64 servicesb4c19 byte(services_tests_b4c19 services_meds_b4c20 services_ehealth_b4c21)
 1 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        10 "Same as before" "Testing, Providing  medication"                                   1 1 0
 2 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        10 "Higher"         "Testing, Providing  medication"                                   1 1 0
 3 22494 "C" "P1" "Superspecialization" 1 "Older than 61 years old"  20 "Lower"          "Testing, Providing  medication"                                   1 1 0
 4 22494 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        20 "Lower"          "Testing, Providing  medication"                                   1 1 0
 5 22494 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        40 "Same as before" "Testing, Providing  medication"                                   1 1 0
 6 22494 "B" "P2" "Superspecialization" 2 "46 - 60 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
 7 22494 "B" "P3" "Superspecialization" 1 "46 - 60 years old"        15 "Same as before" "Testing "                                                         1 0 0
 8 22494 "B" "P3" "Specialization"      1 "30 - 45 years old"        20 "Higher"         "Testing "                                                         1 0 0
 9 22494 "B" "P3" "Superspecialization" 2 "46 - 60 years old"        25 "Lower"          "Other, please specify"                                            0 0 0
10 22494 "B" "P3" "Superspecialization" 1 "30 - 45 years old"        60 "Lower"          "Testing,  Other, please specify"                                  1 0 0
11 22494 "B" "P3" "Superspecialization" 2 "30 - 45 years old"        25 "Lower"          "Testing, Providing  medication"                                   1 1 0
12 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        25 "Same as before" "Providing medication, testing"                                    1 1 0
13 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        30 "Lower"          "Testing, Providing  medication"                                   1 1 0
14 22525 "C" "P1" "Superspecialization" 2 "30 - 45 years old"         3 "Lower"          "Providing medication, testing, Other, please specify"             1 1 0
15 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        10 "Lower"          "Providing medication, testing"                                    1 1 0
16 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        40 "Lower"          "Providing medication, testing"                                    1 1 0
17 22525 "C" "P1" "Superspecialization" 2 "46 - 60 years old"        10 "Lower"          "Providing medication "                                            0 1 0
18 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        10 "Lower"          "Testing, Providing  medication"                                   1 1 0
19 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        50 "Lower"          "Testing, Providing  medication"                                   1 1 0
20 22555 "B" "P2" "Superspecialization" 3 "30 - 45 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
21 22555 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        15 "Lower"          "Testing, Providing  medication"                                   1 1 0
22 22555 "B" "P2" "Superspecialization" 3 "30 - 45 years old"        20 "Lower"          "Testing, Other, please specify Providing  medication,  E-health " 1 1 1
23 22555 "B" "P2" "Superspecialization" 1 "Less than 30 years old"   20 "Lower"          "Testing, Providing  medication"                                   1 1 0
24 22555 "A" "P5" "Superspecialization" 1 "Less than 30 years old"   20 "Same as before" "Providing medication, testing, Other, please specify"             1 1 0
25 22555 "A" "P4" "Bachelors"           1 "30 - 45 years old"        20 "Higher"         "Testing "                                                         1 0 0
26 22555 "A" "P4" "Superspecialization" 3 "Less than 30 years old"   20 "Lower"          "Testing "                                                         1 0 0
27 22555 "A" "P4" "Specialization"      1 "Less than 30 years old"  100 "Lower"          "E-health "                                                        0 0 1
28 22555 "A" "P4" "Superspecialization" 3 "30 - 45 years old"         5 "Lower"          "Providing medication "                                            0 1 0
29 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        60 "Lower"          "Testing,  Other, please specify"                                  1 0 0
30 22555 "A" "P5" "Superspecialization" 1 "30 - 45 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
end
format %tdnn/dd/CCYY dateofsurvey

The following are the codes I have started with (I am using StataMP 13 on Windows 10):

Code:

  
gen wt_strat=13400/400
replace wt_strat=6000/800 if agency=="B"
replace wt_strat=4500/700 if agency=="C"
gen FPC_Strata=1/wt_strat

Following this I ran the survey set command:

Code:

  svyset id [pweight=wt_strat], strata(agency) fpc(fpc_strat)

Code:

    pweight: wt_strat
         VCE: linearized
  Single unit: missing
     Strata 1: agency
         SU 1: id
        FPC 1: fpc_strat

Where ‘id’ is a variable specific for each doctor in the list.

Please correct me if I have gone wrong in the above steps assuming stratified random sampling. Or should strategies to account for non-response be incorporated in the above command lines?

Accounting for non-response

To account for non-response, I read about post stratification (in previous threads in Statalist, and literature) but I have data on only 3 variables across non-responders and responders. I also read that paradata can be used to account for non-response analysis (Kreuter F, Olson K. Paradata for Nonresponse Error Investigation. 2013). I have 1 paradata variable, "number of attempts to contact" specifying the number of times (maximum 3 attempts) a particular doctor was contacted to get a successful interview. But I do not know how to use this variable and in Stata or whether this variable is enough to account for bias.

Requesting your insights on the aforementioned.

Thank you!

Cross section Panel data for exam - elections across 23 countries

Hi all. I appear to be in deep, so hoping anyone can help.

Im doing an exam in quantative methods. For this exam I want to study the effect of the amount of veto-actors on the election turnout, with data from 23 different countries between 1950-2017. The id variable is then country and the time-variable is year. My dependent variable (election turnout) and independent variable (veto-actors) are observed in different years across the different countries, as elections take place irregularly from country to country. Some of my control-variables are measured yearly, while others with regular intervals and others with missing variables some years, depending on the country. So when my x and y observations don't take place in the same year in every country, and when my control-variables are measured with such variation, can i still run a meaningful regression? And what should I have to be aware of when I do so?

Kind regards from a rather stressed student.

Assigning Values of Variable for All Panels

Hi all,

I have tried finding a post already discussing this issue but have not found any; apologies in advance if one exists

Here is my data:
input float(SecId1 dm) double WeeklyReturn float tb3ms
4 705 .5516592499999999 2.25
4 706 .606598 2.33
4 707 -.4112093999999999 2.37
4 708 1.64722825 2.37
4 709 .33371999999999996 2.39
4 710 1.1331326000000002 2.4
4 711 .09498449999999997 2.38
4 712 -1.9586029999999999 2.35
4 713 1.67149 2.17
4 714 -1.3865355 2.1
4 715 -.34597675000000006 1.95
4 716 .21807700000000008 1.89
4 717 1.6438857500000001 1.65
4 718 -.02667074999999998 1.54
4 719 1.3116976 1.54
4 720 -1.0276205 1.52
4 721 -.9711949999999999 1.52
4 722 -3.6885015999999995 .29
4 723 2.11304575 .14
4 724 2.6366712 .13
4 725 .945435 .16
4 726 1.3224075 .13
4 727 .5792539999999999 .1
4 728 -.39374025 .11
4 729 .061644750000000026 .1
4 730 2.6141734000000003 .09
4 731 .96156325 .09
4 732 1.7286028000000002 .08
4 733 -.7715675000000002 .04
4 734 -.0966455 .03
4 735 .10510433333333336 .02
5 705 -.95295275 .
5 706 -.019057249999999915 .
5 707 -1.358661 .
5 708 1.3026425 .
5 709 .6011622499999999 .
5 710 .17128120000000008 .
5 711 -.054412 .
5 712 -2.04645475 .
5 713 1.3629257999999997 .
5 714 -.81613775 .
5 715 -.5691437500000001 .
5 716 .5598118000000001 .
5 717 1.3654247499999999 .
5 718 .3868075 .
5 719 .4214298 .
5 720 -.84145575 .
5 721 -1.54410075 .
5 722 -2.7440836 .
5 723 1.583352 .
5 724 1.9272106 .
5 725 -.7555624999999999 .
5 726 .09120900000000004 .
5 727 .8842686000000001 .
5 728 -.53119275 .
5 729 -.513367 .
5 730 2.970973 .
5 731 .19980999999999996 .
5 732 .5054336 .
5 733 .5529535 .
5 734 .6751975 .
5 735 .5922396666666666 .

SecId1 is the panel variable and dm the monthly variable.

tb3ms is the three month Treasury bill rate that I downloaded separately from the dataset and then pasted onto it. I would like to compute the excess return of each security i.e. WeeklyReturn - tb3ms.

For this, I would need to assign the same 31 values for tb3ms (one for each month) to each panel. How could this be achieved? I have tried using expand, which failed. I also tried using David Kantor's "carryforward" command, combined with tsfill, but this also failed.

I would greatly appreciate any help!

Best regards,
Maxence

Putexcel with row and column name labels

I am trying to export tables from stata to word using putexcel, where the row and column variables have value labels. However, after using the matlist command, the frequencies are not reported correctly

Code:

. tab sect edulevel, matcell(cellcounts)

                      |             edulevel
                 sect | Below Pri    Primary     middle |     Total
----------------------+---------------------------------+----------
          Agriculture |     4,213      1,383      1,545 |     8,555 
 Forestry and Fishing |        84         77        104 |       403 
               Mining |       118         64        131 |       551 
           Food manuf |       207        144        292 |     1,125 
Textile and leather m |       350        386        618 |     2,359 
           Wood Manuf |       107         71        178 |       523 
Media Printing and Re |         6         19         58 |       199 
      Chemicals Manuf |        91         82        203 |     1,078 
      Non-metal Manuf |       193         77        168 |       693 
    Basic Metal Manuf |        65         64        119 |       661 
      Machinery manuf |        87         91        171 |       640 
    Electronics Manuf |         5          7         19 |       148 
         Equipt Manuf |        68        102        235 |     1,197 
      Furniture Manuf |        59         75        150 |       432 
         Manuf Others |        47         77        149 |       458 
Repair & install mach |        25         38        103 |       354 
    Elec & Gas Supply |        38         29        115 |       594 
  Sanitation Services |        61         44         88 |       338 
         Construction |     2,877      1,612      2,544 |     9,253 
    Civil Engineering |       219        148        199 |       940 
 Special Construction |       246        268        570 |     1,676 
            Wholesale |       180        177        420 |     1,877 
               Retail |       253        329      1,014 |     3,689 
Land & Pipeline Trans |       456        470      1,109 |     3,495 
Transportation & post |        62         46        144 |       742 
              Tourism |       240        208        378 |     1,418 
  Media, Telecom & IT |        15         10         73 |     1,290 
Finance Legal & Mktg  |        33         59        277 |     3,081 
          Real Estate |         7          3         12 |        69 
Architect. & Engineer |         0          2          1 |        95 
                  R&D |         3          2          4 |        49 
           Veterinary |         0          2          5 |        53 
           Employment |         6          3         15 |        63 
Security and Building |        92         72        201 |       677 
         Public Admin |       181        167        617 |     4,111 
            Education |       188        180        421 |     6,591 
           Health svs |        69         55        251 |     1,843 
Resid & social Worker |        48         37         75 |       444 
     Art & entertain. |        32         26         58 |       226 
        Organizations |        33         32         82 |       335 
Repair and personal s |       221        114        159 |       802 
   Domestic Personnel |       816        335        460 |     1,966 
----------------------+---------------------------------+----------
                Total |    12,101      7,187     13,535 |    65,093 


                      |             edulevel
                 sect | secondary  Higher Se     Higher |     Total
----------------------+---------------------------------+----------
          Agriculture |       875        391        148 |     8,555 
 Forestry and Fishing |        62         47         29 |       403 
               Mining |        81         64         93 |       551 
           Food manuf |       195        140        147 |     1,125 
Textile and leather m |       499        281        225 |     2,359 
           Wood Manuf |        74         46         47 |       523 
Media Printing and Re |        48         17         51 |       199 
      Chemicals Manuf |       182        179        341 |     1,078 
      Non-metal Manuf |        99         68         88 |       693 
    Basic Metal Manuf |       135        110        168 |       661 
      Machinery manuf |       121         78         92 |       640 
    Electronics Manuf |        16         19         82 |       148 
         Equipt Manuf |       191        184        417 |     1,197 
      Furniture Manuf |        94         41         13 |       432 
         Manuf Others |        89         56         40 |       458 
Repair & install mach |        80         45         63 |       354 
    Elec & Gas Supply |       106        104        202 |       594 
  Sanitation Services |        61         39         45 |       338 
         Construction |     1,395        571        254 |     9,253 
    Civil Engineering |       137         66        171 |       940 
 Special Construction |       315        159        118 |     1,676 
            Wholesale |       317        284        499 |     1,877 
               Retail |       772        658        663 |     3,689 
Land & Pipeline Trans |       737        387        336 |     3,495 
Transportation & post |       135        156        199 |       742 
              Tourism |       225        167        200 |     1,418 
  Media, Telecom & IT |        89        120        983 |     1,290 
Finance Legal & Mktg  |       303        425      1,984 |     3,081 
          Real Estate |         8         11         28 |        69 
Architect. & Engineer |         2          2         88 |        95 
                  R&D |         4          2         34 |        49 
           Veterinary |        10          7         29 |        53 
           Employment |         5          8         26 |        63 
Security and Building |       138         98         76 |       677 
         Public Admin |       716        890      1,540 |     4,111 
            Education |       537        844      4,421 |     6,591 
           Health svs |       241        328        899 |     1,843 
Resid & social Worker |        97         78        109 |       444 
     Art & entertain. |        29         27         54 |       226 
        Organizations |        60         46         82 |       335 
Repair and personal s |       133         97         78 |       802 
   Domestic Personnel |       184         86         85 |     1,966 
----------------------+---------------------------------+----------
                Total |     9,597      7,426     15,247 |    65,093

I have used the following commands to proceed towards using putexcel to transfer the results to stata

Code:

decode sect, g(sect_s)
levelsof sect_s, local(namesec)
"Agriculture"' `"Architect. & Engineer"' `"Art & entertain."' `"Basic Metal Manuf"' `"Chemicals Manuf"' `"Civil Engineering"' `"Construction"' `"Domestic Personnel"' `"Education"' `"Elec & Gas Supply"' `"Electronics Manuf"' `"Employment"' `"Equipt Manuf"' `"Finance Legal & Mktg Corp Svs"' `"Food manuf"' `"Forestry and Fishing"' `"Furniture Manuf"' `"Health svs"' `"Land & Pipeline Transport"' `"Machinery manuf"' `"Manuf Others"' `"Media Printing and Records"' `"Media, Telecom & IT"' `"Mining"' `"Non-metal Manuf"' `"Organizations "' `"Public Admin"' `"R&D"' `"Real Estate"' `"Repair & install machinery"'  "Repair and personal svs"' `"Resid & social Workers"' `"Retail"' `"Sanitation Services"' `"Security and Building svs"' `"Special Construction"' `"Textile and leather manuf"' `"Tourism"' `"Transportation & postal"' `"Veterinary"' `"Wholesale"' `"Wood Manuf"'

matrix rownames cellcounts = `namesec'

.decode edulevel, g(edu_s)

. levelsof edu_s, local(edu)
`"Below Primary"' `"Higher"' `"Higher Secondary"' `"Primary"' `"middle"' `"sec
> ondary"'

. matrix colnames cellcounts = `edu'

. matlist cellcounts

             | Below P~y     Higher  Higher ~y    Primary     middle 
-------------+-------------------------------------------------------
 Agriculture |      4213       1383       1545        875        391 
Architect.~r |        84         77        104         62         47 
Art & ente~. |       118         64        131         81         64 
Basic Meta~f |       207        144        292        195        140 
Chemicals ~f |       350        386        618        499        281 
Civil Engi~g |       107         71        178         74         46 
Construction |         6         19         58         48         17 
Domestic P~l |        91         82        203        182        179 
   Education |       193         77        168         99         68 
Elec & Gas~y |        65         64        119        135        110 
Electronic~f |        87         91        171        121         78 
  Employment |         5          7         19         16         19 
Equipt Manuf |        68        102        235        191        184 
Finance Le~s |        59         75        150         94         41 
  Food manuf |        47         77        149         89         56 
Forestry a~g |        25         38        103         80         45 
Furniture ~f |        38         29        115        106        104 
  Health svs |        61         44         88         61         39 
Land & Pip~t |      2877       1612       2544       1395        571 
Machinery ~f |       219        148        199        137         66 
Manuf Others |       246        268        570        315        159 
Media Prin~s |       180        177        420        317        284 
Media, Tel~T |       253        329       1014        772        658 
      Mining |       456        470       1109        737        387 
Non-metal ~f |        62         46        144        135        156 
Organizati~s |       240        208        378        225        167 
Public Admin |        15         10         73         89        120 
         R&D |        33         59        277        303        425 
 Real Estate |         7          3         12          8         11 
Repair & i~y |         0          2          1          2          2 
Repair and~s |         3          2          4          4          2 
Resid & so~s |         0          2          5         10          7 
      Retail |         6          3         15          5          8 
Sanitation~s |        92         72        201        138         98 
Security a~s |       181        167        617        716        890 
Special Co~n |       188        180        421        537        844 
Textile an~f |        69         55        251        241        328 
     Tourism |        48         37         75         97         78 
Transporta~l |        32         26         58         29         27 
  Veterinary |        33         32         82         60         46 
   Wholesale |       221        114        159        133         97 
  Wood Manuf |       816        335        460        184         86 

             | secondary 
-------------+-----------
 Agriculture |       148 
Architect.~r |        29 
Art & ente~. |        93 
Basic Meta~f |       147 
Chemicals ~f |       225 
Civil Engi~g |        47 
Construction |        51 
Domestic P~l |       341 
   Education |        88 
Elec & Gas~y |       168 
Electronic~f |        92 
  Employment |        82 
Equipt Manuf |       417 
Finance Le~s |        13 
  Food manuf |        40 
Forestry a~g |        63 
Furniture ~f |       202 
  Health svs |        45 
Land & Pip~t |       254 
Machinery ~f |       171 
Manuf Others |       118 
Media Prin~s |       499 
Media, Tel~T |       663 
      Mining |       336 
Non-metal ~f |       199 
Organizati~s |       200 
Public Admin |       983 
         R&D |      1984 
 Real Estate |        28 
Repair & i~y |        88 
Repair and~s |        34 
Resid & so~s |        29 
      Retail |        26 
Sanitation~s |        76 
Security a~s |      1540 
Special Co~n |      4421 
Textile an~f |       899 
     Tourism |       109 
Transporta~l |        54 
  Veterinary |        82 
   Wholesale |        78 
  Wood Manuf |        85 

.

As I have shown from the output above, the tabulate option frequencies are not rendered after I use the matlist code. The string variables for sect and edulevel are ordered alphabetically rather than what the initial table says. Thus, the frequencies are jumbled up. For instance in the first row, for agriculture, the tabulate option shows that frequency for primary educated is 1383, while after the matlist command, it shows that frequency for Higher educated is 1383 instead.

I require some help regarding correct alignment of the data in the tables. Any help would be really appreciated.

"No observations" problem

Hello,

I have a problem when generating new variables out of existing one. I have a variable "pclass" having 1310 observations where some have value 1, some value 2 and some value 3.
My task was to generate pclass1, pclass2 and pclass3. But when I did, the output was "no observations"

Does anybody know how would they solve this?

Thanks

analyzes of companies accounts

I am trying to make an analysis of all danish companies accounting for some years. Do any one have made and publish a accounting analysis for there own country, and do you use open source for accouting (do-files) as probably in IFRS9 standards it is the same system, that you are gathering information from, and probably you have some ideas too. Just contact me if you have suggestions or links to articles and opens sources. Thanks in advance.

Defining varlists for temporary variables

I'm stumped on how to create a varlist vt that contains the temporary variables

Code:

`_ty1' `_ty2' `_ty3'

that are defined in the following code.

The foreach loop I use at the bottom accomplishes this but I'm assuming there must be a one-line local definition using a * wildcard. However I can't seem to get the single/double quote nesting correct to accomplish this. (The real application has lots of temporary variables, thus the desire to use a wildcard.)

Can anyone advise? I'm happy to use the foreach loop but this seems unnecessarily clunky. Also in the real example the variables aren't indexed by a simple `j' thus the desire to use y* as the varlist in my current foreach loop.

Thanks in advance.

Code:

cap preserve
cap drop _all

set obs 100

forval j=1/3 {
 tempvar _ty`j'
 gen y`j'=10*uniform()
 gen `_ty`j''=floor(y`j')
}

local vt=" "
foreach y of varlist y* {
 local vt="`vt'"+"`_t`y'' "
}
sum `vt'

cap restore

ivreghdfe runs a long time after convergence

Hello everyone.

I am performing an IV-Regression with multiple fixed effects. My dataset contains about 20 million observations, with a size of around 3gb.
For this i am running the code:

Code:

ivreghdfe dlunit_price (dlquantity = dlduty_percent), absorb(ct ht cs) cluster(hs6 iso2)

In general my code works and the regression converges after a short amount of time (5-10 minutes). But after it shows me that the regressor converged it takes a long time to display the result. If i am using a bigger dataset it doesn't even display me the result after 12 hours.
So my question is, why does it take so long to show me the results? Or which processes happen after the convergence?

I also checked my computers performance while running the regression but it does not seem to reach its limit.

Power calculation Cox regression one cohort multiple vars

Hi there!

I am doing a cohort study (1 group) with cancer patients for which I made a Cox model with 6 significant predictor variables (a continuous variable of interest and 5 category variables that are known predictors of death) to predict survival (death/alive).
In total I have n=795 patients of which only 35% had the event (death) and follow-up time differed for patients between 1 and 4 years.

I got the question to explain that the study is currently powered enough given N=795 and only a few events and 6 predictor variables of which some have multiple levels. (e.g. variable "stage" has values of 1, 2 and 3)
At the moment, I can only find only find examples of binary variables of interest or the comparison of survival between 2 groups. Given I have only 1 group, I am struggeling on what analysis to do. I have tried the power cox command, but am not sure what values to put in (e.g. how to interpret the SD of the tested covariate properties)

Could someone help me in the right direction?

Kind regards Jessy

Variable gets omitted

Dear Forum,

I have the following command:

Code:

reg birthrate ibn.incomegroup, noconstant

Now I want to add an interaction to the regression:

Code:

reg birthrate c.GDP##ibn.incomegroup, noconstant

However for one group of the independent variable "income group" I always get the result omitted

Am I doing something wrong? Is there a way to show all the results?

identifying the predictors of Cataract progressions

Dear Researchers,

I have asked this question before, but it seems that I didn’t get the answer so far, and this is because the way that I asked the question is not clear, so I will ask it in another way to get an answer, please.

I have three questions, and I need you kindly to answer the ones that you know, please.

I have cross-sectional data for the Diabetes group. I am trying to test a model that predicts a specific complication (i.e., dependent variable) by other health complications (i.e., independent variables).

Where:
The dependent variable is called Cataract and it is an ordinal variable.
The independent variables are nominal, ordinal, and scale variables.

So, I think the code at first will be:

Code:

 
ge id= _n
encode patient, gen(PATIENT)

The first question:
I have an independent variable that consists of two groups in the same column, where the first group named type 1 while the second group named type 2. So, in this case, I need to test if there is significant differences between the two groups or not, so in this case, I think I should create another dummy variable coded 1 for the first group and 2 for the second group, then I will use the independent sample t-test. Am I correct?

The second question:
Let us assume that the model that I want to use is the following model, and please remember that my main aim is to predict a specific complication “ Cataract”. So, the model is:

Cataract= Gender + Age + Diabetes duration Where the Cataract is ordinal variable.
Gender is nominal variable
Age is ordinal
Diabetes duration is scale.

So, I think that I should use the ordinal logistic regression, and the code I think for this will be:
Code:

Code:


Ologit  Cataract Gender  Age  Diabetes_duration, r

But I need to see the relationship between each of the independent variable with the dependent variable, for instance, the effect of Gender on all levels of Cataract, so I think the code will be:

Code:


margins, dydx ( Gender)
marginsplot

So could you please correct me if any of the above codes are not correct?

The third question:
I am very interested in finding whether the component of gender I mean male and female have different predictions on the dependent variable or not, so could you please tell me what is the code if I can do that?

Thanks very much in advance.

Drop embedded observations

Hello,
I have a household panel dataset with some observations being fully captured by other observations. Households can split off over time and are then reassigned their original household ID. Thus, a household ID can be assigned to several observations (where household ID is only the wave specific ID). I reshaped the panel and kept only the relevant variables which I hope illustrates the case.

For instance, obs 3, 6 or 10 are redundant. Ideally I would like to keep observations that are included in others but only as split-offs, though. For instance, obs 53 is captured by obs 51 and 52, but both of them are split-offs in the third period, for which obs 53 does not have information. In such cases it might make more sense to keep it as an "original" household.
The "rule" derives from the examples: Observations should be dropped if 1) their sequence of hhids appears in another observation, 2) (ideally but I do not know how feasible it is) this other observation is an original household in the period following the one for which the to be dropped observation does not contain a value anymore.

I tried to use collapse, to create duplicates or to count missing values, but none really works. I would appreciate any suggestion. And I already contacted the data provider about the panel composition, but did not receive any answer.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double obs str16(hhid1 hhid2) double split_off2 str16 hhid3 double split_off3 str16 hhid4 double split_off4
  1 "01010140020171" "0101014002017101" 1 "0001-001" 1 "0001-001" 1
  2 "01010140020171" "0101014002017101" 1 "0001-001" 1 "0001-004" 2
  3 "01010140020171" "0101014002017101" 1 ""         . ""         .
  4 "01010140020284" "0101014002028401" 1 "0002-001" 1 "0002-001" 1
  5 "01010140020297" "0101014002029701" 1 "0003-001" 1 "0003-001" 1
  6 "01010140020297" "0101014002029701" 1 ""         . ""         .
  7 "01010140020297" "0101014002029704" 2 ""         . ""         .
  8 "01010140020409" "0101014002040901" 1 "0005-001" 1 "0005-001" 1
  9 "01010140020471" "0101014002047101" 1 "0006-001" 1 ""         .
 10 "01010140020471" ""                 . ""         . ""         .
 11 "01010140020551" "0101014002055101" 1 "0007-001" 1 "0007-001" 1
 12 "01010140020761" "0101014002076101" 1 "0008-001" 1 "0008-001" 1
 13 "01010140020762" "0101014002076201" 1 "0009-001" 1 "0009-001" 1
 14 "01020030030004" "0102003003000401" 1 "0010-001" 1 "0010-001" 1
 15 "01020030030022" "0102003003002201" 1 "0011-001" 1 "0012-001" 1
 16 "01020030030022" "0102003003002201" 1 "0011-001" 1 "0012-003" 2
 17 "01020030030022" "0102003003002201" 1 "0011-004" 2 ""         .
 18 "01020030030140" "0102003003014001" 1 "0012-001" 1 "0013-001" 1
 19 "01020030030161" "0102003003016101" 1 "0013-001" 1 "0014-001" 1
 20 "01020030030174" "0102003003017401" 1 "0014-001" 1 "0015-001" 1
 21 "01020030030174" "0102003003017407" 2 "0015-001" 1 "0017-001" 1
 22 "01020030030200" "0102003003020001" 1 "0016-001" 1 "0018-001" 1
 23 "01020030030430" "0102003003043001" 1 "0017-001" 1 "0019-001" 1
 24 "01020030030430" "0102003003043001" 1 ""         . ""         .
 25 "01020030030479" "0102003003047901" 1 "0018-001" 1 "0020-001" 1
 26 "01020170030001" "0102017003000101" 1 "0019-001" 1 ""         .
 27 "01020170030001" "0102017003000101" 1 "0019-003" 2 ""         .
 28 "01020170030001" "0102017003000104" 2 "0020-001" 1 ""         .
 29 "01020170030017" "0102017003001701" 1 "0021-001" 1 ""         .
 30 "01020170030022" "0102017003002201" 1 "0022-001" 1 ""         .
 31 "01020170030022" "0102017003002201" 1 ""         . ""         .
 32 "01020170030048" "0102017003004801" 1 "0023-001" 1 ""         .
 33 "01020170030100" "0102017003010001" 1 "0024-001" 1 ""         .
 34 "01020170030209" "0102017003020901" 2 "0025-001" 1 ""         .
 35 "01020170030209" ""                 . "0025-001" 1 ""         .
 36 "01020170030241" "0102017003024101" 1 "0026-001" 1 ""         .
 37 "01020170030241" "0102017003024101" 1 ""         . ""         .
 38 "01020170030246" "0102017003024601" 1 "0027-001" 1 ""         .
 39 "01030130040161" "0103013004016101" 1 "0028-001" 1 ""         .
 40 "01030130040219" "0103013004021901" 1 "0029-001" 1 ""         .
 41 "01030130040259" "0103013004025901" 1 "0030-001" 1 ""         .
 42 "01030130040346" "0103013004034601" 1 "0031-001" 1 ""         .
 43 "01030130040468" "0103013004046801" 1 "0032-001" 1 ""         .
 44 "01030130040685" "0103013004068501" 1 "0033-001" 1 ""         .
 45 "01030130040739" "0103013004073901" 1 "0034-001" 1 ""         .
 46 "01030130040739" "0103013004073901" 1 "0034-003" 2 ""         .
 47 "01030130040739" "0103013004073901" 1 ""         . ""         .
 48 "01030130040745" "0103013004074501" 1 "0035-001" 1 ""         .
 49 "01030133010068" "0103013301006801" 1 "0036-001" 1 ""         .
 50 "01030133010092" "0103013301009201" 1 "0037-001" 1 ""         .
 51 "01030133010175" "0103013301017501" 1 "0038-001" 2 ""         .
 52 "01030133010175" "0103013301017501" 1 "0038-002" 2 ""         .
 53 "01030133010175" "0103013301017501" 1 ""         . ""         .
 54 "01030133010188" "0103013301018801" 1 "0039-001" 1 ""         .
 55 "01030133010188" "0103013301018801" 1 "0039-004" 2 ""         .
 56 "01030133010188" "0103013301018801" 1 ""         . ""         .
 57 "01030133010188" "0103013301018803" 2 "0040-001" 1 ""         .
 58 "01030133010300" "0103013301030001" 1 "0041-002" 1 ""         .
 59 "01030133010300" "0103013301030001" 1 "0041-006" 2 ""         .
 60 "01030133010300" "0103013301030001" 1 ""         . ""         .
 61 "01030133010322" "0103013301032201" 1 "0042-001" 1 ""         .
 62 "01030133010411" "0103013301041101" 1 "0043-001" 1 ""         .
 63 "01030133010411" "0103013301041101" 1 "0043-002" 2 ""         .
 64 "01030133010411" "0103013301041102" 2 "0044-001" 1 ""         .
 65 "01030133010652" "0103013301065201" 1 "0045-001" 1 ""         .
 66 "01040173040004" "0104017304000401" 1 "0046-001" 1 ""         .
 67 "01040173040004" "0104017304000401" 1 "0046-002" 2 ""         .
 68 "01040173040004" "0104017304000401" 1 ""         . ""         .
 69 "01040173040017" "0104017304001701" 1 "0047-001" 1 ""         .
 70 "01040173040022" "0104017304002201" 1 "0048-001" 1 ""         .
 71 "01040173040022" "0104017304002201" 1 "0048-002" 2 ""         .
 72 "01040173040022" ""                 . "0048-001" 1 ""         .
 73 "01040173040034" "0104017304003401" 1 "0049-001" 1 ""         .
 74 "01040173040034" "0104017304003406" 2 ""         . ""         .
 75 "01040173040034" "0104017304003407" 2 "0051-002" 2 ""         .
 76 "01040173040041" "0104017304004102" 2 "0052-001" 1 ""         .
 77 "01040173040041" ""                 . ""         . ""         .
 78 "01040173040086" "0104017304008601" 1 "0053-001" 1 ""         .
 79 "01040173040086" ""                 . "0053-001" 1 ""         .
 80 "01040173040092" "0104017304009201" 1 "0054-001" 1 ""         .
 81 "01040173040094" "0104017304009401" 1 "0055-001" 1 ""         .
 82 "01040173040094" "0104017304009402" 2 "0056-001" 1 ""         .
 83 "01040310010030" "0104031001003001" 1 "0057-001" 1 ""         .
 84 "01040310010102" "0104031001010201" 1 "0058-001" 1 ""         .
 85 "01040310010174" "0104031001017402" 1 "0059-001" 1 ""         .
 86 "01040310010174" "0104031001017402" 1 "0059-002" 2 ""         .
 87 "01040310010174" "0104031001017403" 2 "0060-001" 1 ""         .
 88 "01040310010174" ""                 . ""         . ""         .
 89 "01040310010180" "0104031001018001" 1 "0061-001" 1 ""         .
 90 "01040310010462" "0104031001046201" 1 "0062-001" 1 ""         .
 91 "01040310010482" "0104031001048201" 1 "0063-001" 1 ""         .
 92 "01040310010482" ""                 . ""         . ""         .
 93 "01040310010745" "0104031001074501" 1 "0064-001" 1 ""         .
 94 "01040310010745" "0104031001074502" 2 "0065-001" 1 ""         .
 95 "01040310010745" ""                 . "0064-001" 1 ""         .
 96 "01040310011128" "0104031001112801" 1 "0066-001" 1 ""         .
 97 "01040310011128" "0104031001112801" 1 ""         . ""         .
 98 "01040310011128" "0104031001112804" 2 "0067-001" 1 ""         .
 99 "01040380030347" "0104038003034701" 1 "0068-001" 1 ""         .
100 "01040380030396" "0104038003039601" 1 "0069-001" 1 ""         .
end
label values split_off2 ha_10
label values split_off3 ha_10
label values split_off4 ha_10
label def ha_10 1 "ORIGINAL HOUSEHOLD", modify
label def ha_10 2 "SPLIT-OFF HOUSEHOLD", modify

Clarification on merge command

I am trying to merge two dataset- one dta file consist of 8576 unique observation and another dta file consist of 1187 unique id and 1220 observations. Now as i merge using merge 1:m command, the result is 1189 matched and 7415 from master and 31 from using file are not matched. What i make out is 1189 and 21 makes 1220 observation remaining observation should be 8576 - 1220 = 7356. But unmatched from master is 7415 which is 59 more. How to read it & it makes my observation more than the actual no. of cases surveyed. Please guide. After merging the dataex file attached here with
-------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id long u_id float umpce_class byte(sector district) float(gender age_group est_pop blk_id age weight) byte(blk6_q4 blk6_q5 blk6_q6) long(blk7_q11 blk7_q14) float(eblk7_q11 eblk7_q14) byte blk7_q16 float ail_cat byte(x _merge)
57100110101 571001101 1 2 3 1 4 15.225 .  .      .  . . .    .    .        .       . .  . . 1
57100110102 571001101 1 2 3 2 3 15.225 1 27 15.225 87 1 1 1070 1500 16290.75 22837.5 1 12 1 3
57100110103 571001101 1 2 3 1 1 15.225 .  .      .  . . .    .    .        .       . .  . . 1
57100110104 571001101 1 2 3 2 1 15.225 .  .      .  . . .    .    .        .       . .  . . 1
57100110201 571001102 1 2 3 1 3 15.225 .  .      .  . . .    .    .        .       . .  . . 1
end
label values _merge _merge
label def _merge 1 "master only (1)", modify
label def _merge 3 "matched (3)", modify

------------------ copy up to and including the previous line ------------------

How to analyze data for repeated measures?

Hi everyone
Could you please share your recommendations, regarding the questions at the end of this post?

Research objectives:
1. What is the association between tourists' and residents' aesthetic experiences and destination aesthetic features?
Note: Aesthetic experiences are categorized into 6 types of experiences. For example, the experience of the beautiful and experience of the ugly and 4 more types of experiences.
Destination aesthetic features: a distinguished 7-point Likert semantic differential scale with 18 items. For example item 1 reads like:
I would say that the place was............ not crowded 1 2 3 4 5 6 7 crowded.
2. How often the six types of aesthetic experiences occur? (7 points Likert scale for frequency)

Variables:
Dependent variables:
1. Comprehensive descriptions of six types of aesthetic experiences. For example, the experience of the beautiful reads like "You feel you are lucky that you have the chance to enjoy and acknowledge the appealing moment of experiencing the beauty. You feel thankful, fascinated, happy, and very pleased......"
2. The frequency of occurrence of the experiences
Independent variables:
1. Residenst' district of living in the destination (5 districts)/ Tourists' city of residence (9 cities)
2. Tourist's and resident's demographic profile (Age, Gender, Education)
3. Tourists length of stay at the destination during their current travel and residents length of residency at the same destination
4. Travel frequency of tourists and residents during last year
5. Tourists' purpose of the trip (leisure, business, visiting friends and family)
6. The individual's evaluation of the Destination Aesthetic Features

I have the following research design.
Multilevel analysis: experiences are nested in individuals

Repeated measure mixed model design
Level 1: repeated measurement of the association between aesthetic experiences and destination aesthetic features
Level 2: tourists and residents

Study setting: A specific city (tourism destination)
Sample: Two groups of people (300 Tourists who travel to that specific city and 300 Residents who live in that city)
Repeated Measurement: A semantic differential scale of 18 items (18 features of a city that may make the city to be perceived as beautiful or ugly)
At the occurrence of 6 types of a specific kind of experience (e.g., the experience of beauty, the experience of ugliness, ...)

Time: Note 1: A cross-section survey is currently distributing among the target population (at a single point in time).
Note 2: Participants will answer the survey considering the occurrence of 6 types of experiences during a specific period of time. This means the residents will consider the occurrence of those experiences during the time that they have been residing in that city (e.g., some years). With the same token, tourists will consider it during the time they have been staying in the city in their current travel (e.g., some days).
Subject factor: Both within-subject factor and between-subject factor

Note A: As no study has been conducted to link the above-mentioned features of a city to those specific types of experiences, our study is exploratory and does not pose hypotheses.
Note B: The repeated measures are to be compared among the mentioned six experiences, following the principles of a within-subject design.
Note C: A cross-level interaction term (group × features of the city) will also be entered and estimated in order to compare the evaluations between tourists and residents. Please see an example of my dataset here:

Note D: Descriptions of 6 experiences and the frequency of occurrence of those experiences are dependent variables and other variables are independent.

May I sincerely ask your recommendations on:
1) How to analyze the data?
2) How should I conduct power analysis for calculating the sample size? For now, I considered collecting data from 300 tourists and 300 residents but I am not sure whether it is necessary to recruit overall 600 people or not. (Note: Some people may experience all 6 types of experiences, some may have 5 to 2 experiences and few people may have only 1 experience)
3) How should I treat missing data?

Many thanks in advance,