BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Sunday, January 31, 2021

gmatch and psmatch2

I am trying to conduct a logistic regression after propensity score matching. I tried both gmatch psmatch2 and am wondering if I should expect similar result from both? For example :

webuse cattaneo2, clear
rename order orderb

* Using GMATCH
logistic mbsmoke i.mrace i.foreign mage medu fage fedu prenatal orderb i.frace i.mmarried i.fbaby
predict ps
gmatch mbsmoke ps, maxc(1) set(set1) diff(diff1)
logistic deadkids i.mbsmoke i.mrace i.foreign mage medu fage fedu prenatal orderb i.frace i.mmarried i.fbaby if set1 < .
**

* uisng PAMATCH2
psmatch2 mbsmoke i.mrace i.foreign mage medu fage fedu prenatal orderb i.frace i.mmarried i.fbaby, outcome(deadkids) neighbor(1)
logistic deadkids i.mbsmoke i.mrace i.foreign mage medu fage fedu prenatal orderb i.frace i.mmarried i.fbaby [fweight=_weight]

Should I expect similar result for the logistic regression after running the above commands?
Actually in this case the results were close.
************************************************** ************************************************** ************************************************** *****
However, I am doing a similar analysis using another dataset. I run the commands as below:

use mydata, clear
USING GMATCH
logistic disability_child i.healthCond_cat2 i.affectLife i.symptom i.ageCat3 i.workingHourPerWeek_cat4 i.singleFather_n
predict ps
gmatch disability_child ps, maxc(1) set(set1) diff(diff1)
logistic k6cat2v2 i.disability_child i.healthCond_cat2 i.affectLife i.symptom i.ageCat3 i.workingHourPerWeek_cat4 i.singleFather_n if set1 < .
**

USING PSMATCH2
psmatch2 disability_child i.healthCond_cat2 i.affectLife i.symptom i.ageCat3 i.workingHourPerWeek_cat4 i.singleFather_n, outcome(k6cat2v2) neighbor(1)
logistic k6cat2v2 i.disability_child i.healthCond_cat2 i.affectLife i.symptom i.ageCat3 i.workingHourPerWeek_cat4 i.singleFather_n [fweight=_weight]

However, the results are far from similar. Some values are above or below 0, just the opposite. Why is that so?

I want to use psmatch2. Also, I want to produce descriptive table of the matched cases and control after psmatch2. How can I do so?

Add Optional t-test to Stata Program

I have a stata program that displays a twoway bar graph and I want to add a t-test to the program, but the t-test has to be optional. I am a bit confused because I was under the impression that the program syntax applied to the command that was being used, in this case -twoway bar-. Can you put the t-test into the program syntax with [] to specify that the t-test is optional or is there some other argument farther into the code that I need to use to specify that the t-test is optional?

esttab

Why is it when I enter the following command:

"forvalues i = 1/5 {
regress edge percentFRL ipr_se_variation popdensity if schoolsize== `i'
eststo schoolsize`i'
}

esttab _all using schoolsize.csv"

that I am given a spreadsheet that includes a column (the first column) for the regression using all the observations instead of just divided up by schoolsize? Is there a way to prevent that? It's not a big deal as I can just delete the column, but I don't remember it doing this in the past.


	(1)	(2)	(3)	(4)	(5)	(6)
	edge	edge	edge	edge	edge	edge

percentFRL	0.618***	0.385***	0.591***	0.636***	0.675***	0.578***
	(277.02)	(48.98)	(90.98)	(162.07)	(183.46)	(84.89)

ipr_se_variation	0.449***	0.663***	0.581***	0.444***	0.304***	0.463***
	(81.62)	(35.99)	(39.44)	(46.75)	(31.67)	(28.72)

popdensity		0.000000196	0.000000485	0.00000102***	0.00000141***	-0.00000122***
		(0.24)	(1.82)	(5.90)	(6.14)	(-3.56)

_cons	-0.184***	-0.108***	-0.184***	-0.192***	-0.188***	-0.181***
	(-109.15)	(-17.07)	(-39.19)	(-65.53)	(-66.27)	(-38.12)

N	173507	14666	24155	60282	58238	16139

t statistics in parentheses
0	** p<0.01	*** p<0.001"

xtivreg2 instrumental variables with endogenous variable involves in interaction terms

Hi all,

I'm conducting analysis in the form below:

Code:

y = x1 + x2 + x3 + x1*x2

In which x1 is the variable of interest and it is endogenous. Say I have a valid instrument for x1: namely z1. The data is in panel structure so ivreg2 is used.

without including the interaction term, I can run the code below successfully:

Code:

xtivreg2: y x2 x3 (x1=z1), fe

But with the interaction term,

Code:

xtivreg2: y x2 x3 (x1 c.x1#c.x2 = z1 c.z1#c.x2), fe

throws an error:

Code:

cx1#c:  operator invalid

I searched online and find a potential remedy here, what it does is:

Code:

gen x1x2 = x1*x2
gen z1x2 = z1*x2
xtivreg2: y x2 x3 (x1 x1x2 = z1 z1x2), fe

This works indeed, but I want to know would this affects the consistency and efficiency of the coefficients?
Or more broadly, what is the shortcoming of this method as compared to one in which I can just run:

Code:

xtivreg2: y x2 x3 (x1 c.x1#c.x2 = z1 c.z1#c.x2), fe

Thanks in advance for all your time and knowledge

reshape data long to wide with several IDs

I could not reshape my data from my previous posting. Please help me solve this problem.

This is a part of my data.

code:

input year str6 state value str6 type
2010 GA 1 A
2011 GA 2 A
2012 GA 3 A
2010 GA 4 B
2011 GA 5 B
2012 GA 6 B
2010 NY 7 A
2011 NY 8 A
2012 NY 9 A
2010 NY 10 B
2011 NY 11 B
2012 NY 12 B

year	state	value	type
2010	GA	1	A
2011	GA	2	A
2012	GA	3	A
2010	GA	4	B
2011	GA	5	B
2012	GA	6	B
2010	NY	7	A
2011	NY	8	A
2012	NY	9	A
2010	NY	10	B
2011	NY	11	B
2012	NY	12	B

I want to convert my data like this:

year	state	value_A	value_B
2010	GA	1	4
2011	GA	2	5
2012	GA	3	6
2010	NY	7	10
2011	NY	8	11
2012	NY	9	12

Thanks!

Count return clients

Dear All,

I would like to count the number of return clients in 2020 (the first occurrence/visit either was in 2018 or 2019 for example and as shown in the below data).
I have generated a first occurrence variable (and restrict it to be in 2018 or 2019 and not 2020) "first_visit_not2020". and another variable "tag" to just count the number of clients in each year. I have tried to link these two variables together in "bys" command

Code:

 egen tag2020 = tag(id yr) if  tag1==1 & first_visit_not2020 ==1

here is example of the data

Code:

id	yr	tag	tag2020	return_visit	first_visit_not2020
1	2018	1	0	1	1
1	2018	0	0	2	0
1	2018	0	0	3	0
1	2019	1	0	4	0
1	2019	0	0	5	0
1	2019	0	0	6	0
1	2019	0	0	7	0
1	2019	0	0	8	0
1	2019	0	0	9	0
1	2019	0	0	10	0
1		0	0	11	0
2	2019	1	0	1	1
2	2019	0	0	2	0
2	2019	0	0	3	0
2	2019	0	0	4	0
2	2020	1	1	5	0
2	2020	0	0	6	0
2	2020	0	0	7	0
3	2020	1	1	1	0
3	2020	0	0	2	0
3	2020	0	0	3	0
3	2020	0	0	4	0
3	2020	0	0	5	0
3	2020	0	0	6	0
3	2020	0	0	7	0
3	2020	0	0	8	0
3		0	0	9	0

Also, would like to be able to count the number of visits by "id" in 2020 (the return year)
Thanks for your help in advance!

Reshape data long to wide but not normal wide

Hello,

I want to reshape my data long to wide.

So, I googled it and found this web page: https://stats.idre.ucla.edu/stata/mo...-long-to-wide/

This is a sample data from the web page.

famid	birth	age
1	1	9
1	2	6
1	3	3
2	1	8
2	2	6
2	3	2
3	1	6
3	2	4
3	3	2

I want to reshape my data like this:

age1	age2	age3
9	8	6
6	6	4
3	2	2

Each column has age values of each id. I don't care about birth. So, the column age1 has age values from famid = 1 and the column age 2 has age values from famid =2, and so on. Thank you for your help.

Test to show a regressor is non stochastic?

Other than producing and eyeballing a scatter diagram, is there a standard way to prove/discover that an independent variable is non-stochastic so that it complies with the assumptions of simple OLS?

Thank you,

Peter J.

Graphing one product verses an average of all other products

Hi All,

I am having trouble thinking how I can graph an average of two products ("Coca-Cola" and "tonic water") as "Soft Drinks" within the prod variable along with an average price line of all other goods listed in the sample over time. I had tried to graph many individual lines together, but it becomes quite messy and you are unable to see exactly what is being graphed where. Instead, just have one line for "Soft Drinks" and one line for everything else. I think a "twoway" line should work, but I am not sure how to get the averages for "Soft Drinks" and for the remaining other product average lines (ideally without creating a new "total" observation which would change the results of any regressions that I run). A snippet of my data is attached below. Thanks so much for any ideas!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double abs_price str23 combined str66 prod float store_type int year str3 currency str7 country str10 city float tax
 2.99 "2014IrelandDublinEUR" "Apples (1 kg)"  0 2014 "EUR" "Ireland" "Dublin"     0
 2.24 "2014IrelandDublinEUR" "Apples (1 kg)"  1 2014 "EUR" "Ireland" "Dublin"     0
    3 "2015IrelandDublinEUR" "Apples (1 kg)"  0 2015 "EUR" "Ireland" "Dublin"     0
 2.48 "2015IrelandDublinEUR" "Apples (1 kg)"  1 2015 "EUR" "Ireland" "Dublin"     0
    3 "2016IrelandDublinEUR" "Apples (1 kg)"  0 2016 "EUR" "Ireland" "Dublin"     1
 2.34 "2016IrelandDublinEUR" "Apples (1 kg)"  1 2016 "EUR" "Ireland" "Dublin"     1
  3.2 "2017IrelandDublinEUR" "Apples (1 kg)"  0 2017 "EUR" "Ireland" "Dublin"     1
 2.15 "2017IrelandDublinEUR" "Apples (1 kg)"  1 2017 "EUR" "Ireland" "Dublin"     1
 2.99 "2018IrelandDublinEUR" "Apples (1 kg)"  0 2018 "EUR" "Ireland" "Dublin"     1
 2.19 "2018IrelandDublinEUR" "Apples (1 kg)"  1 2018 "EUR" "Ireland" "Dublin"     1
 2.54 "2019IrelandDublinEUR" "Apples (1 kg)"  0 2019 "EUR" "Ireland" "Dublin"     1
  2.4 "2019IrelandDublinEUR" "Apples (1 kg)"  1 2019 "EUR" "Ireland" "Dublin"     1
 2.61 "2020IrelandDublinEUR" "Apples (1 kg)"  0 2020 "EUR" "Ireland" "Dublin"     1
 2.25 "2020IrelandDublinEUR" "Apples (1 kg)"  1 2020 "EUR" "Ireland" "Dublin"     1
 2.52 "2014UKManchesterGBP"  "Apples (1 kg)"  0 2014 "GBP" "UK"      "Manchester" 0
 1.95 "2014UKLondonGBP"      "Apples (1 kg)"  0 2014 "GBP" "UK"      "London"     0
 1.86 "2014UKLondonGBP"      "Apples (1 kg)"  1 2014 "GBP" "UK"      "London"     0
 1.93 "2014UKManchesterGBP"  "Apples (1 kg)"  1 2014 "GBP" "UK"      "Manchester" 0
 1.85 "2015UKLondonGBP"      "Apples (1 kg)"  0 2015 "GBP" "UK"      "London"     0
  2.2 "2015UKManchesterGBP"  "Apples (1 kg)"  0 2015 "GBP" "UK"      "Manchester" 0
 1.75 "2015UKManchesterGBP"  "Apples (1 kg)"  1 2015 "GBP" "UK"      "Manchester" 0
 1.75 "2015UKLondonGBP"      "Apples (1 kg)"  1 2015 "GBP" "UK"      "London"     0
  2.1 "2016UKManchesterGBP"  "Apples (1 kg)"  0 2016 "GBP" "UK"      "Manchester" 1
    2 "2016UKLondonGBP"      "Apples (1 kg)"  0 2016 "GBP" "UK"      "London"     1
    2 "2016UKLondonGBP"      "Apples (1 kg)"  1 2016 "GBP" "UK"      "London"     1
 1.75 "2016UKManchesterGBP"  "Apples (1 kg)"  1 2016 "GBP" "UK"      "Manchester" 1
    2 "2017UKLondonGBP"      "Apples (1 kg)"  0 2017 "GBP" "UK"      "London"     1
  2.2 "2017UKManchesterGBP"  "Apples (1 kg)"  0 2017 "GBP" "UK"      "Manchester" 1
 1.75 "2017UKManchesterGBP"  "Apples (1 kg)"  1 2017 "GBP" "UK"      "Manchester" 1
 1.98 "2017UKLondonGBP"      "Apples (1 kg)"  1 2017 "GBP" "UK"      "London"     1
  2.3 "2018UKManchesterGBP"  "Apples (1 kg)"  0 2018 "GBP" "UK"      "Manchester" 1
 2.12 "2018UKLondonGBP"      "Apples (1 kg)"  0 2018 "GBP" "UK"      "London"     1
    2 "2018UKLondonGBP"      "Apples (1 kg)"  1 2018 "GBP" "UK"      "London"     1
 1.97 "2018UKManchesterGBP"  "Apples (1 kg)"  1 2018 "GBP" "UK"      "Manchester" 1
 2.04 "2019UKLondonGBP"      "Apples (1 kg)"  0 2019 "GBP" "UK"      "London"     1
 2.16 "2019UKManchesterGBP"  "Apples (1 kg)"  0 2019 "GBP" "UK"      "Manchester" 1
 1.79 "2019UKLondonGBP"      "Apples (1 kg)"  1 2019 "GBP" "UK"      "London"     1
 2.01 "2019UKManchesterGBP"  "Apples (1 kg)"  1 2019 "GBP" "UK"      "Manchester" 1
 2.14 "2020UKManchesterGBP"  "Apples (1 kg)"  0 2020 "GBP" "UK"      "Manchester" 1
  2.1 "2020UKLondonGBP"      "Apples (1 kg)"  0 2020 "GBP" "UK"      "London"     1
 1.85 "2020UKLondonGBP"      "Apples (1 kg)"  1 2020 "GBP" "UK"      "London"     1
 1.87 "2020UKManchesterGBP"  "Apples (1 kg)"  1 2020 "GBP" "UK"      "Manchester" 1
12.45 "2014IrelandDublinEUR" "Bacon (1 kg)"   0 2014 "EUR" "Ireland" "Dublin"     0
 8.77 "2014IrelandDublinEUR" "Bacon (1 kg)"   1 2014 "EUR" "Ireland" "Dublin"     0
10.14 "2015IrelandDublinEUR" "Bacon (1 kg)"   0 2015 "EUR" "Ireland" "Dublin"     0
 7.44 "2015IrelandDublinEUR" "Bacon (1 kg)"   1 2015 "EUR" "Ireland" "Dublin"     0
 9.99 "2016IrelandDublinEUR" "Bacon (1 kg)"   0 2016 "EUR" "Ireland" "Dublin"     1
 7.08 "2016IrelandDublinEUR" "Bacon (1 kg)"   1 2016 "EUR" "Ireland" "Dublin"     1
 8.64 "2017IrelandDublinEUR" "Bacon (1 kg)"   0 2017 "EUR" "Ireland" "Dublin"     1
    6 "2017IrelandDublinEUR" "Bacon (1 kg)"   1 2017 "EUR" "Ireland" "Dublin"     1
 6.82 "2018IrelandDublinEUR" "Bacon (1 kg)"   0 2018 "EUR" "Ireland" "Dublin"     1
 6.32 "2018IrelandDublinEUR" "Bacon (1 kg)"   1 2018 "EUR" "Ireland" "Dublin"     1
 6.54 "2019IrelandDublinEUR" "Bacon (1 kg)"   0 2019 "EUR" "Ireland" "Dublin"     1
 4.92 "2019IrelandDublinEUR" "Bacon (1 kg)"   1 2019 "EUR" "Ireland" "Dublin"     1
  5.9 "2020IrelandDublinEUR" "Bacon (1 kg)"   0 2020 "EUR" "Ireland" "Dublin"     1
 4.57 "2020IrelandDublinEUR" "Bacon (1 kg)"   1 2020 "EUR" "Ireland" "Dublin"     1
   10 "2014UKManchesterGBP"  "Bacon (1 kg)"   0 2014 "GBP" "UK"      "Manchester" 0
13.96 "2014UKLondonGBP"      "Bacon (1 kg)"   0 2014 "GBP" "UK"      "London"     0
  9.5 "2014UKManchesterGBP"  "Bacon (1 kg)"   1 2014 "GBP" "UK"      "Manchester" 0
    5 "2014UKLondonGBP"      "Bacon (1 kg)"   1 2014 "GBP" "UK"      "London"     0
12.36 "2015UKLondonGBP"      "Bacon (1 kg)"   0 2015 "GBP" "UK"      "London"     0
  9.5 "2015UKManchesterGBP"  "Bacon (1 kg)"   0 2015 "GBP" "UK"      "Manchester" 0
 6.67 "2015UKLondonGBP"      "Bacon (1 kg)"   1 2015 "GBP" "UK"      "London"     0
 8.32 "2015UKManchesterGBP"  "Bacon (1 kg)"   1 2015 "GBP" "UK"      "Manchester" 0
11.96 "2016UKLondonGBP"      "Bacon (1 kg)"   0 2016 "GBP" "UK"      "London"     1
 9.33 "2016UKManchesterGBP"  "Bacon (1 kg)"   0 2016 "GBP" "UK"      "Manchester" 1
 7.67 "2016UKManchesterGBP"  "Bacon (1 kg)"   1 2016 "GBP" "UK"      "Manchester" 1
 6.67 "2016UKLondonGBP"      "Bacon (1 kg)"   1 2016 "GBP" "UK"      "London"     1
10.84 "2017UKLondonGBP"      "Bacon (1 kg)"   0 2017 "GBP" "UK"      "London"     1
 8.14 "2017UKManchesterGBP"  "Bacon (1 kg)"   0 2017 "GBP" "UK"      "Manchester" 1
 6.18 "2017UKManchesterGBP"  "Bacon (1 kg)"   1 2017 "GBP" "UK"      "Manchester" 1
 6.67 "2017UKLondonGBP"      "Bacon (1 kg)"   1 2017 "GBP" "UK"      "London"     1
 9.09 "2018UKLondonGBP"      "Bacon (1 kg)"   0 2018 "GBP" "UK"      "London"     1
 8.04 "2018UKManchesterGBP"  "Bacon (1 kg)"   0 2018 "GBP" "UK"      "Manchester" 1
 7.48 "2018UKLondonGBP"      "Bacon (1 kg)"   1 2018 "GBP" "UK"      "London"     1
  6.4 "2018UKManchesterGBP"  "Bacon (1 kg)"   1 2018 "GBP" "UK"      "Manchester" 1
 8.33 "2019UKManchesterGBP"  "Bacon (1 kg)"   0 2019 "GBP" "UK"      "Manchester" 1
10.93 "2019UKLondonGBP"      "Bacon (1 kg)"   0 2019 "GBP" "UK"      "London"     1
 7.33 "2019UKLondonGBP"      "Bacon (1 kg)"   1 2019 "GBP" "UK"      "London"     1
 6.67 "2019UKManchesterGBP"  "Bacon (1 kg)"   1 2019 "GBP" "UK"      "Manchester" 1
  8.5 "2020UKManchesterGBP"  "Bacon (1 kg)"   0 2020 "GBP" "UK"      "Manchester" 1
 10.9 "2020UKLondonGBP"      "Bacon (1 kg)"   0 2020 "GBP" "UK"      "London"     1
  7.8 "2020UKLondonGBP"      "Bacon (1 kg)"   1 2020 "GBP" "UK"      "London"     1
 7.09 "2020UKManchesterGBP"  "Bacon (1 kg)"   1 2020 "GBP" "UK"      "Manchester" 1
 1.25 "2014IrelandDublinEUR" "Bananas (1 kg)" 0 2014 "EUR" "Ireland" "Dublin"     0
 1.25 "2014IrelandDublinEUR" "Bananas (1 kg)" 1 2014 "EUR" "Ireland" "Dublin"     0
 1.25 "2015IrelandDublinEUR" "Bananas (1 kg)" 0 2015 "EUR" "Ireland" "Dublin"     0
 1.25 "2015IrelandDublinEUR" "Bananas (1 kg)" 1 2015 "EUR" "Ireland" "Dublin"     0
 1.39 "2016IrelandDublinEUR" "Bananas (1 kg)" 0 2016 "EUR" "Ireland" "Dublin"     1
 1.25 "2016IrelandDublinEUR" "Bananas (1 kg)" 1 2016 "EUR" "Ireland" "Dublin"     1
 1.89 "2017IrelandDublinEUR" "Bananas (1 kg)" 0 2017 "EUR" "Ireland" "Dublin"     1
 1.59 "2017IrelandDublinEUR" "Bananas (1 kg)" 1 2017 "EUR" "Ireland" "Dublin"     1
 1.59 "2018IrelandDublinEUR" "Bananas (1 kg)" 0 2018 "EUR" "Ireland" "Dublin"     1
 1.39 "2018IrelandDublinEUR" "Bananas (1 kg)" 1 2018 "EUR" "Ireland" "Dublin"     1
 1.59 "2019IrelandDublinEUR" "Bananas (1 kg)" 0 2019 "EUR" "Ireland" "Dublin"     1
 1.19 "2019IrelandDublinEUR" "Bananas (1 kg)" 1 2019 "EUR" "Ireland" "Dublin"     1
 1.46 "2020IrelandDublinEUR" "Bananas (1 kg)" 0 2020 "EUR" "Ireland" "Dublin"     1
 1.19 "2020IrelandDublinEUR" "Bananas (1 kg)" 1 2020 "EUR" "Ireland" "Dublin"     1
  .68 "2014UKLondonGBP"      "Bananas (1 kg)" 0 2014 "GBP" "UK"      "London"     0
  1.2 "2014UKManchesterGBP"  "Bananas (1 kg)" 0 2014 "GBP" "UK"      "Manchester" 0
end

How to make a Difference in Differences Graph

I want to make a graph showing parallel slopes for the following data. The variable REGION consists of Northeast regions coded as 1 and south regions coded as 3. The variable totalvisits is the total amount of visits. I want to create a graph showing the slopes of the totalvisits of every REGION by year.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double REGION float totalvisits double VYEAR
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
1 1502 2008
end
label values REGION REGIONF
label def REGIONF 1 "Northeast", modify

How to change the look of a subset of bars using specific criteria - bar graphs

Dear all,

I am working with a dataset that contains the number of candidates for each call of a study abroad program and I prepared the following graph:

Code:

    #delimit ;
        graph bar (asis) number_of_candidates_per_call,
            over(call_number, sort(1) descending label(labsize(*0.4)))
            nofill
            linetype(line) lines(lw(vthin) lc(gs12))
            by(call_year, title("Number of candidates per call", placement(center)))
            subtitle("",  nobox  fcolor(none) nobexpand pos(9))
            ysc(alt)  
            horiz
            yla(, nogrid format(%10.0fc) labsize(*0.5))
            blabel(bar, format(%9.0fc) size(*0.4))
        scheme(s1color)
    ;
    #delimit cr

Array

I would like to

(i) change the color (to red) of bars associated with a subset of four different countries: 'USA', 'UK', 'POR' and 'CAN'.
Obs: There are 21 different countries in the dataset. The variable for the acronym of the destination countries is called 'dest_country'.

(ii) place a label containing the destination country acronym just to the right of the bar only for the aforementioned subset of bars (associated with the values 'USA', 'UK', 'POR' and 'CAN' of the variable 'dest_country').

(iii) include a "title" for each panel according to the variable 'call_year' (it assumes the following values: 2011, 2012, 2013 and 2014) so that one could understand to which year each panel refers.

You can find below the code to import the dataset.

Each observation is a call and each call is linked to one country.

Can you help me with that?

Thank you very much.

Any help is greatly appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long call_number int call_year float number_of_candidates_per_call str3 dest_country_acronym
19 2013 180 "USA"
 9 2012 141 "POR"
31 2013 119 "USA"
 7 2012 101 "UK"
26 2013  98 "UK"
42 2013  90 "UK"
51 2014  88 "USA"
64 2014  73 "UK"
25 2013  60 "CAN"
 2 2012  53 "USA"
32 2013  38 "GER"
22 2013  36 "HGR"
 4 2012  31 "AUS"
 8 2012  31 "SPN"
36 2013  31 "IRE"
59 2014  31 "CAN"
20 2013  30 "GER"
 3 2012  29 "GER"
61 2014  28 "SPN"
49 2013  28 "SPN"
24 2013  25 "AUS"
56 2014  21 "AUS"
67 2014  20 "IRE"
55 2014  19 "AUS"
52 2014  19 "GER"
41 2013  16 "CAN"
33 2013  15 "FRA"
34 2013  14 "ITL"
28 2013  14 "AUS"
 6 2012  14 "NL"
18 2013  13 "IRE"
23 2013  13 "CAN"
30 2013  12 "NEZ"
62 2014  11 "NL"
38 2013  11 "HGR"
54 2014  11 "ITL"
58 2014  11 "BEL"
68 2014  10 "HGR"
10 2012  10 "FRA"
48 2013   9 "NL"
27 2013   9 "CAN"
60 2014   9 "CAN"
40 2013   9 "AUS"
44 2013   7 "AUS"
43 2013   7 "CAN"
11 2012   7 "HGR"
12 2012   6 "ITL"
29 2013   5 "FIN"
57 2014   4 "BEL"
35 2013   4 "NRW"
 1 2011   4 "USA"
53 2014   4 "FRA"
69 2014   2 "JPN"
50 2014   2 "POL"
17 2013   2 "CHI"
37 2013   2 "CHI"
45 2013   2 "FIN"
46 2013   2 "NEZ"
47 2013   2 "BEL"
21 2013   2 "JPN"
65 2014   1 "SWE"
39 2013   1 "JPN"
63 2014   1 "NEZ"
14 2012   1 "SWE"
 5 2012   1 "SKO"
66 2014   1 "NRW"
13 2012   1 "USA"
15 2012   1 "NRW"
16 2012   1 "JPN"
end
label values call_number no_chamada2
label def no_chamada2 1 "101", modify
label def no_chamada2 2 "117", modify
label def no_chamada2 3 "118", modify
label def no_chamada2 4 "119", modify
label def no_chamada2 5 "121", modify
label def no_chamada2 6 "122", modify
label def no_chamada2 7 "123", modify
label def no_chamada2 8 "126", modify
label def no_chamada2 9 "127", modify
label def no_chamada2 10 "128", modify
label def no_chamada2 11 "129", modify
label def no_chamada2 12 "130", modify
label def no_chamada2 13 "132", modify
label def no_chamada2 14 "133", modify
label def no_chamada2 15 "134", modify
label def no_chamada2 16 "135", modify
label def no_chamada2 17 "136", modify
label def no_chamada2 18 "138", modify
label def no_chamada2 19 "143", modify
label def no_chamada2 20 "144", modify
label def no_chamada2 21 "145", modify
label def no_chamada2 22 "146", modify
label def no_chamada2 23 "147", modify
label def no_chamada2 24 "148", modify
label def no_chamada2 25 "149", modify
label def no_chamada2 26 "151", modify
label def no_chamada2 27 "152", modify
label def no_chamada2 28 "153", modify
label def no_chamada2 29 "154", modify
label def no_chamada2 30 "155", modify
label def no_chamada2 31 "156", modify
label def no_chamada2 32 "157", modify
label def no_chamada2 33 "158", modify
label def no_chamada2 34 "159", modify
label def no_chamada2 35 "161", modify
label def no_chamada2 36 "162", modify
label def no_chamada2 37 "163", modify
label def no_chamada2 38 "164", modify
label def no_chamada2 39 "165", modify
label def no_chamada2 40 "167", modify
label def no_chamada2 41 "168", modify
label def no_chamada2 42 "170", modify
label def no_chamada2 43 "171", modify
label def no_chamada2 44 "172", modify
label def no_chamada2 45 "173", modify
label def no_chamada2 46 "174", modify
label def no_chamada2 47 "176", modify
label def no_chamada2 48 "177", modify
label def no_chamada2 49 "178", modify
label def no_chamada2 50 "179", modify
label def no_chamada2 51 "180", modify
label def no_chamada2 52 "181", modify
label def no_chamada2 53 "182", modify
label def no_chamada2 54 "183", modify
label def no_chamada2 55 "184", modify
label def no_chamada2 56 "185", modify
label def no_chamada2 57 "186", modify
label def no_chamada2 58 "187", modify
label def no_chamada2 59 "188", modify
label def no_chamada2 60 "189", modify
label def no_chamada2 61 "191", modify
label def no_chamada2 62 "193", modify
label def no_chamada2 63 "194", modify
label def no_chamada2 64 "195", modify
label def no_chamada2 65 "197", modify
label def no_chamada2 66 "198", modify
label def no_chamada2 67 "199", modify
label def no_chamada2 68 "201", modify
label def no_chamada2 69 "202", modify

need help with import excel

Sorry I have tried this using import excel, copying the data directly, and using Stat Transfer, but I everything I do gives a value for the 2017 std error for case 1 (Denmark) at 1.77 rather than the correct value of 2.75. Any help with converting it would be appreciated. File attached. Thanks for any help.

Confusing behavior by -more-

I'm trying to use -more- to force a pause between display commands in a loop. Here is the code:

Code:

set more on
forvalues x = 1/5{
    di `x'
    more
}

When I run this in the command line, it works as I would expect it to - each number is displayed, followed by --more--. Upon pressing any key, the next number is displayed followed by --more-- and so on.

When run from a do-file, however, all 5 numbers are displayed instantly without --more-- breaks or pauses for user input. I checked if -more- was still set to on throughout the loop using this code:

Code:

set more on
forvalues x = 1/5{
    di `x'
    di `"`c(more)'"'
    more
}

Which outputs:

Code:

1
on
2
on
3
on
4
on
5
on

But again, without any --more-- messages or waiting for user input.

Am I misunderstanding how the command is supposed to work or misusing it somehow?

I'm running Stata/SE 16.1 on Windows 10.

linear Regression with treatment and control variables

Dear all,

I am interested to study the impact of monetary policy changes on the stock market return over the last 20 years.

the list of my variables for linear regression in an event study model is:

date	event date	DAX index	Conventional surprise pre-crisis	Expected change crisis	Expected change pre-crisis	Conventional surprise crisis	MSCI world index	crisis
1990/01/01	1	100						0
...	0	101						0
2020/12/30	0	102						1

1- "return of the DAX index" as my dependent variable;
2- "event dates or policy announcement dates" as the treatment variable which takes 1 when the event occurs in total 272 events;
3- "MSCI world index" and "crisis dummy" as The vector of control variables;
4- "Conventional surprise pre-crisis", "Expected change pre-crisis", Conventional surprise crisis", "Expected change crisis" as my independent variables.

my model:

return of the DAX index = constant + Conventional surprise pre-crisis + Expected change pre-crisis + Conventional surprise crisis + Expected change crisis + The vector of control variables+ residual

I was recommended to use "bacondecomp module" which shows a Bacon decomposition of difference-in-differences estimation with variation in treatment timing. The two-way fixed effects DD model is a weighted average of all possible two-group/two-period DD estimators. The command generates a scatterplot of 2x2 difference-in-difference estimates and their associated weights. The data must be xtset and the variable list must include an outcome as the first item, and a treatment that can only turn from zero to one during the time period examined as its second item.https://ideas.repec.org/c/boc/bocode/s458676.html

my question is that how can I use xtset data structure? because I don't have a factor variable that has different categories within the variable (like company1, comapy2, campany3, ...). my data is a time series for all of my variables and my dependent variable is just the return of the DAX index value (times series).

can someone please help me with this question?

thank you so much,
Shahrzad

Calculating a gender wage gap

I have a data set with University degree subjects, salary bands and then the number of men and women in each salary band. How would I go about calculating a gender wage gap for each degree subject? I have started generating a dummy variable for each degree but as there are multiple people in each salary band I am unsure how to calculate the gender wage gap.

TIA

calculating how percentiles performed in a panel dataset

Dear Statalist,

My panel dataset is as follows

year firm size sales

2000 firm1 size1 sales1
2000 firm2 size2 sales2
...
2001
2001
...

2020

There are about 200/300 firs in each year.
I want to calculate the mean sales for each year for each percentile (top 25%, 25%-50%,50%-75%,75%-100%) under four
variables based on the size of the firms. For example in the year 2000, top 25% largest firms what were the sales, same year next 25%-50% what were the sales etc.
I use them to time series plots and for tables.
Appreciate your help.

Saturday, January 30, 2021

"0 (empty)" during logistic regression

I was measuring odds ratio for systolic hypertension using bmi as a variable, but when I do so the 4th category gives me "0 (empty)". what does that mean? Could it have to do with the fact that there are only 2 people within hat category (category 4)? If that's the case, it didn't give me "0 (empty)" when i used the same variable to find odds ratio for diastolic hypertension. I dont know what to do to fix this zero or what it means. Can you please help me out, TIA!

_bmi |
2 | .7326816 .6527876 1.12 0.262 -.5467586 2.012122
3 | .7566587 .8385862 0.90 0.367 -.8869401 2.400257
4 | 0 (empty)

Graph of Parallel Trend Lines

Hello, I have the data set below. This is a data set of total number of visits to the emergency room by region in the US. I want to implement a DiD method to see if the Medicare expansion in 2014 lowered the number of emergency hospital visits in the Northeast region, but I am having trouble generating a graph showing the parallel trends. Northeast is coded as 1 and South is coded as 3. I want to generate a graph plotting the totalvisits by region through time. So X-axis totalvisits, y-axis years and two lines one for Region=1 and Region=3.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float totalvisits double(VMONTH VYEAR REGION) float treatment
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
1502 1 2008 1 0
end
label values VMONTH VMONTHF
label def VMONTHF 1 "January", modify
label values REGION REGIONF
label def REGIONF 1 "Northeast", modify

Filling missing observations with successor and predecessors ID's values

Dear Stata Users,
Can you please help me with the following issue: I have a company (“gvkey”) with a person ID (pers_ID) who is associated with starting date (“start”) and final date (“final”) date at work. There are some missing values of “start” and “final”. What I need is to replace missing value of “start” with “final” if there is a non-missing “final” for a person who preceded a new person. The same is expected to be done for “final” – if there is a missing “final” it is supposed to be filled with “start” of a successor’s date. For instance, for firm = 006856 there is a missing “start” for “pers_ID”= 1386. However, there is a non-missing date for her predecessor (pers_ID = 1385). Thus “start” of “pers_ID”= 1386 is expected to be filled with “01nov1993”. Another example for firm = 010390 there is a missing “final” for “pers_ID”= 193, year=2009. However, there is a non-missing date for her successor (pers_ID = 14). Thus “final” of “pers_ID”= 193 is expected to be filled with “01jan2011”

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str24 gvkey double(fyear pers_ID) long(start final)
"001164" 1993  2627  9222     .
"001164" 1994  2627  9222     .
"001164" 1995  2627  9222     .
"001164" 1996  2627  9222     .
"001164" 1997  2627  9222     .
"001164" 1998  2627  9222     .
"001164" 1999  2627  9222     .
"001164" 2000  2627  9222     .
"001164" 2001  2627  9222     .
"001164" 2004 29310 15675     .
"004503" 1992   869     . 12171
"004503" 1993   870 12171 16801
"004503" 1994   870 12171 16801
"004503" 1995   870 12171 16801
"004503" 1996   870 12171 16801
"006856" 1992  1385 10897 12358
"006856" 1993  1385 10897 12358
"006856" 1994  1386     . 12751
"010374" 2007 19397 14245 18352
"010374" 2008 19397 14245 18352
"010374" 2009 19397 14245 18352
"010374" 2010 14379     . 18628
"010390" 2007   193 14245 18352
"010390" 2008   193 14245 18352
"010390" 2009   193 14245     .
"010390" 2010    14 18628     .
end
format %td start
format %td final

Regarding use Matchit to fuzzy merge two different data set

Hello everyone,

I am trying to merge two data set by their unique string variables. However, these two datasets do not share the same format for their string value.

For example,

Data set 1:

Year	CEO NAME	FIRM NAME
1995	Jonny Dan	AAA Company
1995	Tommy Z	BBB Company
1995	Jacky Young	Ccc Company
1995	Bill Morgan	ddd Company

Data set 2:

Year	CEO NAME	FIRM NAME
1995	Jonh Daniel	AAA
1995	Tom Zeus	BBB Firm
1995	JACKY YOUNG	Ccc Corporation
1995	William Morgan	DDD Company

So if I want to merge these two data set by CEO name and Firm name. How should I write the code?

Thanks in advance

Identifying observations of var1 that have different values for var2

I have a dataset in which the data are identified by state, county, and ZIP code - each of these is a different variable.

My problem is that some ZIP codes span over more than one county, so I need to be able to identfy which observation of the variable "ZIP: have different values on the variable "county". Fow example, below, I need to be able to identify that 00112 is a ZIP that has different values (1 and 2) on the variable "county", because it spans across two counties:

County - ZIP
1 - 00111
1 - 00112
2 - 00112
2 - 00113
2 - 00114

What code would allow me to do that for the entire dataset? Thank you in advance!

Reshaping data with three dimensions

Hello, I am having trouble reshaping my data as it will require something beyond the usual long to wide format. My dataset is a relationship matrix for each person in a household across 50 years. More specifically, there is a row for every person-to-person relationship in each year. However, I would like it to be in person-year format with variables for the ID of all relationships in that year on the same row.

Note that "altid" refers to the ID of the other person in that specific relationship. For example, ID 124 may be ID 123's father, and IDs 125 and 126 may be ID 123's siblings. Furthermore, you will notice that the "rel" variable (i.e., relationship to the respondent) is 55 for each row. That is intentional and due to subsetting.

I have subsetted the data as needed and have about 555,500 rows. Please note that not everyone has the same number of relationships and that the same person may have a different number of relationships at a different year. There are no missing data.

My data looks as follows:

Code:

clear 
input int(id year altid rel)
123 1968 124 55
123 1968 125 55
123 1968 126 55
123 1968 127 55
345 1968 346 55
345 1968 347 55
345 1968 348 55
567 1968 568 55
567 1968 569 55
123 1969 124 55
123 1969 125 55
123 1969 127 55
345 1969 346 55
345 1969 348 55
345 1969 349 55
567 1969 568 55
567 1969 569 55
567 1969 570 55
567 1969 571 55
567 1969 572 55
567 1969 573 55
end

I would like the data to look like this:

Code:

input int(id year altid1 altid2 altid3 altid4 altid5 altid6 rel)
123 1968 124 125 126 127 . . 55
123 1969 124 125 127 . . . 55
345 1968 346 347 348 . . . 55
345 1969 346 348 349 . . . 55
567 1968 568 569 . . . . 55
567 1969 568 569 570 571 572 573 55
end

To be clear, I have looked a number of threads with similar circumstances but none seem to quite match my situation. Accordingly, I am at a bit of a loss. Any advice would be sincerely appreciated!

Creating a matrix with Euclidean distances between variables.

Hello everybody,

I use Stata 13.1 and I am working with a dataset that contains 25 numerical variables (var1-var25) and around 400 observations. I want to create a 25x25 matrix A. Each cell of A reports the euclidean distance between the corresponding pair of variables. For example, a11=sqrt[sum_i(var1_i -var1_i)^2] and the subscript i refers to the observation i (from 1 to 400), a21=sqrt[sum_i(var2_i -var1_i)^2] etc. Obviously, A will be a square and symmetric matrix, in which the elements of the diagonal will be zero. This is some sort of dissimilarity matrix, I have tried the command matrix dissim to achieve that but it creates a 400x400 matrix in which each cell reports the distance of each observation instead of the total distance for each variable. Is there any simple way to create this matrix?

Thanks,

David Puig

Reshaping wide error values of variable date not unique within id

Hi,
I`m fairly new to Stata and I know this question is previously answered however I could not make it work for me.

This is the type of data I`m working with:

id date r i
1 2002_11 1.307071 .87494
1 2002_12 1.403008 1.019082
1 2003_01 1.570926 1.152942
1 2003_02 1.894784 1.307366
1 2003_03 1.798847 1.235295

This is the command i`m using:

reshape wide r i, i(id) j(date) str

This is the error:
values of variable date not unique within id
Your data are currently long. You are performing a reshape wide. You specified i(id) and j(date). There are observations
within i(id) with the same value of j(date). In the long data, variables i() and j() together must uniquely identify the
observations.

What do I need to change or add ?

Thanks for all replies.

Best,
Emerson

Developing person weights from other variables

I have to "develop person weights from the age and sex variables using an iterative proportional fitting process with the variables of age and sex".

I never did this before, so any suggestion will be highly appreciated!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte age str6 sex int systolicbp 
49 "male"   134
41 "male"   158
46 "female" 158
63 "male"   157
24 "female" 157
23 "female" 157
46 "male"   157
67 "female" 118
59 "male"   160
46 "female" 157
19 "female"   .
59 "female" 168
62 "male"   131
53 "female" 164
49 "male"   118
37 "female" 172
31 "male"     .
66 "male"   136
54 "female" 169
27 "male"     .
23 "male"   142
35 "male"   102
32 "male"   134
61 "male"   137
31 "male"   160
63 "female" 164
25 "female" 112
66 "female" 142
69 "male"     .
21 "male"   148
43 "male"   159
39 "male"   142
37 "female"   .
45 "male"   161
62 "male"   119
34 "male"   153
60 "male"     .
42 "female"   .
48 "female" 119
60 "female" 176
53 "female" 163
22 "male"   147
46 "male"   175
60 "female" 148
68 "male"   146
63 "female" 162
59 "female" 151
64 "female" 106
71 "female"   .
48 "female" 139
55 "male"   132
25 "male"   151
32 "male"   131
56 "female" 150
19 "male"   147
60 "female" 108
32 "female" 119
21 "female" 108
60 "female" 117
71 "male"   151
69 "female" 139
24 "female"   .
26 "male"   116
30 "male"   112
61 "female" 170
67 "female" 112
53 "male"   165
44 "female" 170
48 "female" 153
43 "male"   120
22 "female" 102
57 "female" 115
22 "female" 164
38 "male"   150
50 "female" 161
60 "male"   103
54 "female"   .
29 "male"   147
31 "female" 124
55 "female" 127
31 "female" 166
54 "male"   162
68 "male"   161
32 "female" 159
23 "male"   159
37 "female" 141
72 "female" 119
58 "male"   115
64 "male"   176
37 "female" 129
35 "female" 104
47 "female" 104
34 "male"   116
27 "female" 165
61 "male"     .
46 "male"   138
47 "male"   160
72 "male"   126
47 "female" 172
61 "female" 140
end

How to utilise sample weights in Stata? - QUICK QUESTIONS

Hi Everybody,

I am running a regression analysis using HLFS on Stata. I have been told that I need to use sample weights to have reliable estimates. The HLFS data has a "factor weight" column for each observation. My questions are as follows:

1) Does it mean that HLFS data already provide me with the sample weights? So, I don't need to calculate it?

2) If the answer is yes to (1), how do I use this on Stata?

I am writing a command as below, but I am not quite sure if I am weighting twice. [pweight=weights] --> The bold represents the factor weight column on HLFS data.

oaxaca LnWage var1 var2 var3 var4 var5 [pweight=weights], by(Gender) pooled

3) If answer to (1) is no, then how can I calculate the sample weights, as I don't know the total population of a sector?

Any suggestion would be much appreciated!

Thank you very much for your time and effort.

frecuency in the axes X&Y in Histograms and Scatter plots

Hi all,
Sorry to bother you, I have another question and is regarding the histograms/scatter plots. I would like to be able to select the frecuency in the axes X&Y . I found how to select the frequency in the Y axes but I am still no able to select the frecuency in the X axes. I am using the dialogue box of histogram but I can find it. Sorry . When I generate an histogram, the frecuency in the axes X is created automatically in intervals of 500,
Many thanks

frecuency distributions with intervals

Hi all,

I have the data of a studdy with two variables. and I would like to generate a table with frecuency distributions with intervals of 200 cell/mm3 ( 0-199, 200-399..). The reason I ask this is because I have spent a lot of time investigating this, but I havent found a positive answer.
Many thanks,

How to convert a variable in the format "day, month, year" to another variable in the format "month, year"?

Hello,

I would like to convert a "date" variable in is in the format "day, month, year" to another variable that gives me the information in the "month, year" format. Can you help me with this issue?
I send the code below.
Thank you in advance.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float date str25 country float risk
21915 "afghanistan"                   0
21915 "albania"                       0
21915 "algeria"                       0
21915 "andorra"                       0
21915 "angola"                        0
21915 "argentina"                     0
21915 "aruba"                         0
21915 "australia"                     0
21915 "austria"                       0
21915 "azerbaijan"                    0
21915 "bahamas"                       0
21915 "bahrain"                       0
21915 "bangladesh"                    0
21915 "barbados"                      0
21915 "belarus"                       0
21915 "belgium"                       0
21915 "belize"                        0
21915 "benin"                         0
21915 "bermuda"                       0
21915 "bhutan"                        0
21915 "bolivia"                   62.04
21915 "bosniaandherzegovina"          0
21915 "botswana"                   5.56
21915 "brazil"                        0
21915 "brunei"                        0
21915 "bulgaria"                      0
21915 "burkinafaso"                   0
21915 "burundi"                       0
21915 "cambodia"                      0
21915 "cameroon"                      0
21915 "canada"                        0
21915 "capeverde"                     0
21915 "centralafricanrepublic"        0
21915 "chad"                          0
21915 "chile"                         0
21915 "china"                         0
21915 "colombia"                      0
21915 "comoros"                       .
21915 "congo"                         0
21915 "costarica"                     0
21915 "cotedivoire"                   0
21915 "croatia"                       0
21915 "cuba"                          0
21915 "cyprus"                        0
21915 "czechrepublic"                 0
21915 "democraticrepublicofcongo"     0
21915 "denmark"                       0
21915 "djibouti"                      0
21915 "dominica"                      0
21915 "dominicanrepublic"             0
21915 "ecuador"                       0
21915 "egypt"                         0
21915 "elsalvador"                    0
21915 "eritrea"                       0
21915 "estonia"                       0
21915 "eswatini"                      0
21915 "ethiopia"                      0
21915 "faeroeislands"                 0
21915 "fiji"                          0
21915 "finland"                       0
21915 "france"                        0
21915 "gabon"                         0
21915 "gambia"                        0
21915 "georgia"                       0
21915 "germany"                       0
21915 "ghana"                         0
21915 "greece"                        0
21915 "greenland"                     0
21915 "guam"                          0
21915 "guatemala"                     0
21915 "guinea"                        0
21915 "guyana"                        0
21915 "haiti"                         0
21915 "honduras"                      0
21915 "hongkong"                  13.89
21915 "hungary"                       0
21915 "iceland"                       0
21915 "india"                         0
21915 "indonesia"                     0
21915 "iran"                          0
21915 "iraq"                          0
21915 "ireland"                       0
21915 "israel"                        0
21915 "italy"                         0
21915 "jamaica"                       0
21915 "japan"                         0
21915 "jordan"                        0
21915 "kazakhstan"                    0
21915 "kenya"                         0
21915 "kiribati"                      0
21915 "kosovo"                        0
21915 "kuwait"                        0
21915 "kyrgyzrepublic"                0
21915 "laos"                          0
21915 "latvia"                        0
21915 "lebanon"                       0
21915 "lesotho"                       0
21915 "liberia"                       0
21915 "libya"                         0
21915 "lithuania"                     0
end
format %td date

matcell option in a table with weights

While running
table son father[pw=pn], center format(%9.0f) matcell(PQ)

I am getting an error message "option matcell() not allowed"

How can I save table output in the form of the matrix while using weights?

New package: xtbalance2 - Create a balanced subsample from unbalanced panel data.

Thanks to Kit Baum, a new package called xtbalance2 is available on SSC.

xtbalance2 creates an indicator variable to identify a balanced subsample from an unbalanced dataset. The program tries to maximise the numbers of observations with respect to either the time dimension of the number of cross-sections/groups.

Example:

Code:

use http://www.stata-journal.com/software/sj12-1/st0246/manu_prod, clear
xtbalance2 , generate(balanceN) optimisation(N)
xtbalance2 lO lL lY, generate(balanceT) optimisation(T)

More examples and a detailed description are available in the help file.

How to install:
xtbalance2 can be installed via SSC

Code:

ssc install xtbalance2

or via my Github page which will frequently updated:

Code:

net from https://github.com/JanDitzen/xtbalance2

xtbalance2 is a new package and errors might occur. I would be grateful for any reports about bugs or problems here, by mail to me or via my Github page (https://github.com/JanDitzen/xtbalance2).

Thanks!

Difference in Difference with propensity score matching

Hello Statalist team, may I ask you the following qustion?

I would like to use the DID with propensity score matching in the following settings:
1) Pre treatment period = 2017, After = 2018
2) I have the panel datasets both in 2017 and 2018 for bank level.
3) I would like to major the average treatment effect of a policy occurred in 2018 on a bank deposit.

I use the following STATA code:
"teffects psmatch (Delta_Deposit) (treatment var1 var2 time) if fiscal_year == 2018"

treatment = 2018*treated dummy (bank level)

In this case, do I need to add the regression period of 2017 in the code?
So the correct code could be:
"teffects psmatch (Delta_Deposit) (treatment var1 var2 time) if fiscal_year == 2018 | fiscal_year == 2017"

In fact, I tried both but there is not a big difference between "if fiscal_year == 2018" and "if fiscal_year == 2018 | fiscal_year == 2017".

If I use only the year of 2018, matched sample should be selected based on the year of 2018?
But if I use the both years of 2017 and 2018, the matched sample should be selected based on the years of 2017 and 2018?

Thus in theory, propensity score matching should select the matched sample based on the treated sample meaning only to use 2018 could be the right answer.
Is this understanding correct?

Thank you for your advice in advance.

Best regards;

How to give different color for all the scatter plots in a graph?

Hello everyone, I am using Stata 16.0, The data is in wide format with 29 observations.
It is a National data with state variables.
I would like to create a twoway scatter plot with case fatality rate (cfr) and case positivity rate (cpr) of covid, among all Indian states with death rate as weight.
I have used the following code. I would like to colour all the 29 states mentioned in the data with different colors. How can I do that?

Code:

twoway (scatter cpr cfr, mfcolor(blue) msymbol(none) mlabel(dummy) mlabposition(6) mlabgap(4) mlabsize(vsmall) mlabangle(90) mlabcolor(black)) (scatter cpr cfr [aweight=death])

Thanks in advance

Tobit regression with "outreg2" output?

Dear All, I run this Tobit model

Code:

sysuse auto, clear
gen wgt=weight/1000
// Censored from below
tobit mpg wgt foreign, ll(17)
outreg2 using "tobit", word excel dec(4) replace

and use (ssc install) "outreg2" command to collect the output, which isArray .

I wonder if the content in red/green areas can be removed. Any suggestions are appreciated.

Lagged dependent variable

Hi dear,
I have a question. In my dataset, the dependent variable is related to 2020 but the independent ones, like GDP per capita, unemployment rate, are related to 2019. How can I solve this issue?
Lagging the dependent variable could be a solution?
#stata #dataset #laggedvariable

How to convert to date when data is saved as long format (%tdD_m_Y ); for example-15 Oct 13

How to report the differences in a nice table/graph?

Hi all,

I have self-reported and actually household income from national register in 2010. I have compared them (see below), but I am not sure how to nicely present the differences in a nice table or a graph.
All suggestions is much appreciated.

gen diff_income = self2010 - reg2010
tab diff_income
diff_income
-3 466
-2 231
-1 1,419
0 19,996
2 386
3 189
Total 22,687

-Kate

Friday, January 29, 2021

xtsur with restriction

Dear All, I use xtsur for unbalanced panel data. is there way to restrict the coefficients of two equations to one. it is allowed in sureg command but it is not available with xsur. Please suggest a way out.

lag regression: Not sorted error

Hi, I have tried to run lagged regressions but the error keeps coming up as not sorted. I have sorted it several times such as
sort country_a neighbor_a year
reg F.manu_pc ny_gdp_totl_rt_zs nb_ny_gdp_totl_rt_zs
or
bysort country_a year neighbor_a : reg F.manu_pc ny_gdp_totl_rt_zs nb_ny_gdp_totl_rt_zs

What am I doing wrong, please?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(country_aa neighbor_aa) int year float manu_pc double(ny_gdp_totl_rt_zs nb_ny_gdp_totl_rt_zs)
47 20 1996         .                  .                  .
47 12 1996         .                  .   37.7621783106709
20 47 1996         .                  .                  .
47  9 1996         .                  .   7.94340249688005
48 47 1996  77.68823   .168458779650815                  .
52 47 1996 13.384378   12.0422551725059                  .
47 48 1996         .                  .   .168458779650815
47 52 1996         .                  .   12.0422551725059
 9 47 1996         .   7.94340249688005                  .
47 26 1996         .                  .   4.52832385442244
12 47 1996 279.69852   37.7621783106709                  .
48 47 1997  47.09065 .00110285976658886                  .
47 52 1997         .                  .   14.3710414453306
47  9 1997         .                  .   10.7686925346104
47 20 1997         .                  .                  .
47 12 1997         .                  .                  .
52 47 1997  14.78843   14.3710414453306                  .
20 47 1997         .                  .                  .
47 26 1997         .                  .                  .
 9 47 1997         .   10.7686925346104                  .
47 48 1997         .                  . .00110285976658886
12 47 1997         .                  .                  .
48 47 1998  73.36557                  0                  .
47 26 1998  78.98443                  .   4.12050021800339
 9 47 1998         .   10.0161330915123                  .
52 47 1998  5.140973   13.3573025593913                  .
47 52 1998  78.98443                  .   13.3573025593913
20 47 1998  9.819602                  .                  .
47  9 1998  78.98443                  .   10.0161330915123
47 20 1998  78.98443                  .                  .
47 48 1998  78.98443                  .                  0
12 47 1998  175.9033   10.5527217315912                  .
47 12 1998  78.98443                  .   10.5527217315912
 9 47 1999         .   9.21127663740654                  .
47 26 1999         .                  .    2.8619646885156
47 48 1999         .                  .                  .
47  9 1999         .                  .   9.21127663740654
47 12 1999         .                  .   6.72560250739514
20 47 1999 110.95158   31.6313618038599                  .
48 47 1999  8.153259                  .                  .
52 47 1999 17.125616    12.459792694739                  .
47 52 1999         .                  .    12.459792694739
12 47 1999         .   6.72560250739514                  .
47 20 1999         .                  .   31.6313618038599
47 12 2000         .   36.5175863901134   17.6864480170047
47 20 2000         .   36.5175863901134                  .
52 47 2000 135.63635                  .   36.5175863901134
48 47 2000  42.94725                  0   36.5175863901134
47 26 2000         .   36.5175863901134                  .
47 48 2000         .   36.5175863901134                  0
47  9 2000         .   36.5175863901134                  .
12 47 2000         .   17.6864480170047   36.5175863901134
 9 47 2000         .                  .   36.5175863901134
47 52 2000         .   36.5175863901134                  .
20 47 2000         .                  .   36.5175863901134
48 47 2001  53.64817    12.057854706049                  .
 9 47 2001         .   12.9636234796805                  .
47 52 2001         .                  .   7.16171062839588
47  9 2001         .                  .   12.9636234796805
47 20 2001         .                  .   12.5192073767766
52 47 2001  508.2706   7.16171062839588                  .
47 26 2001         .                  .                  .
47 48 2001         .                  .    12.057854706049
47 12 2001         .                  .   7.43382630864952
12 47 2001         .   7.43382630864952                  .
20 47 2001  8.356758   12.5192073767766                  .
47 52 2002  52.62327                  .                  .
52 47 2002  5.360959                  .                  .
 9 47 2002         .   6.87088478357729                  .
20 47 2002  36.49562   21.9944655131802                  .
47 26 2002  52.62327                  .   3.30414443954597
48 47 2002 177.85097                  .                  .
12 47 2002  44.08522   29.1270470489371                  .
47 20 2002  52.62327                  .   21.9944655131802
47 48 2002  52.62327                  .                  .
47 12 2002  52.62327                  .   29.1270470489371
47  9 2002  52.62327                  .   6.87088478357729
47 26 2003         .                  .   5.19874652321157
52 47 2003  30.24151    9.2324782172425                  .
12 47 2003  228.6589                  .                  .
47 20 2003         .                  .   14.4108160774123
20 47 2003 100.67827   14.4108160774123                  .
47 12 2003         .                  .                  .
47 52 2003         .                  .    9.2324782172425
 9 47 2003  165.0732   13.4576521096668                  .
47  9 2003         .                  .   13.4576521096668
48 47 2003  160.2715   .123883750247631                  .
47 48 2003         .                  .   .123883750247631
52 47 2004  728.3743   15.6936079046794                  .
48 47 2004  54.56564   7.26590982073566                  .
12 47 2004         .   32.3167409500178                  .
47 12 2004         .                  .   32.3167409500178
47 20 2004         .                  .   18.9035621146269
 9 47 2004         .    6.3621830473943                  .
20 47 2004         .   18.9035621146269                  .
47  9 2004         .                  .    6.3621830473943
47 52 2004         .                  .   15.6936079046794
47 48 2004         .                  .   7.26590982073566
47 26 2004         .                  .   2.97168353810198
52 47 2005 1.3103304   14.1507408463472                  .
end
label values country_aa country
label values neighbor_aa country
label def country 9 "Central African Republic", modify
label def country 12 "Congo, Dem. Rep.", modify
label def country 20 "Ethiopia", modify
label def country 47 "South Sudan", modify
label def country 48 "Sudan", modify
label def country 52 "Uganda", modify
label def country 26 "Kenya", modify

Repeated time values within panel help

Hi All,

I am having some trouble with understanding the xtset command even after reading through some past posts and the help section on it. I want to use the leads and lags function in a regression, but am having trouble understanding how the time variables are set. I thought that by having an encoded variable for all of the different products I have, I could use that as the panalvar and then I could use the "year" variable for timevar. Unfortunately, I am getting the error "Repeated time values within panel help." I have my data posted below--please let me know if you have any suggestions for fixing this issue so that I could use the lead/lag functionality with an assigned timevar.

Many thanks,
Jim

Code:

xtset id_test5 year
repeated time values within panel
r(451);

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 country str43 prod int year byte type float(percent after2016_soft_drink) long id_test5 byte(_Iyear_2015 _Iyear_2016 _Iyear_2017 _Iyear_2018 _Iyear_2019 _Iyear_2020)
"UK"      "Asian Speciality Drinks" 2014 0         100 0 1 0 0 0 0 0 0
"UK"      "Asian Speciality Drinks" 2015 0         100 0 1 1 0 0 0 0 0
"UK"      "Asian Speciality Drinks" 2016 0         100 0 1 0 1 0 0 0 0
"UK"      "Asian Speciality Drinks" 2017 0         100 0 1 0 0 1 0 0 0
"UK"      "Asian Speciality Drinks" 2018 0         100 0 1 0 0 0 1 0 0
"UK"      "Asian Speciality Drinks" 2019 0         100 0 1 0 0 0 0 1 0
"UK"      "Asian Speciality Drinks" 2020 0         100 0 1 0 0 0 0 0 1
"UK"      "Baby Food"               2014 0    90.39353 0 2 0 0 0 0 0 0
"UK"      "Baby Food"               2014 1    9.606463 0 2 0 0 0 0 0 0
"Ireland" "Baby Food"               2014 0    88.76344 0 2 0 0 0 0 0 0
"Ireland" "Baby Food"               2014 1    11.23656 0 2 0 0 0 0 0 0
"Ireland" "Baby Food"               2015 1   11.146546 0 2 1 0 0 0 0 0
"UK"      "Baby Food"               2015 1    9.811398 0 2 1 0 0 0 0 0
"Ireland" "Baby Food"               2015 0    88.85345 0 2 1 0 0 0 0 0
"UK"      "Baby Food"               2015 0     90.1886 0 2 1 0 0 0 0 0
"UK"      "Baby Food"               2016 1    9.562346 0 2 0 1 0 0 0 0
"Ireland" "Baby Food"               2016 0    88.91432 0 2 0 1 0 0 0 0
"UK"      "Baby Food"               2016 0    90.43765 0 2 0 1 0 0 0 0
"Ireland" "Baby Food"               2016 1   11.085676 0 2 0 1 0 0 0 0
"UK"      "Baby Food"               2017 0    90.41187 0 2 0 0 1 0 0 0
"UK"      "Baby Food"               2017 1    9.588136 0 2 0 0 1 0 0 0
"Ireland" "Baby Food"               2017 0    88.97702 0 2 0 0 1 0 0 0
"Ireland" "Baby Food"               2017 1   11.022975 0 2 0 0 1 0 0 0
"UK"      "Baby Food"               2018 0    90.35996 0 2 0 0 0 1 0 0
"UK"      "Baby Food"               2018 1    9.640036 0 2 0 0 0 1 0 0
"Ireland" "Baby Food"               2018 0    89.05828 0 2 0 0 0 1 0 0
"Ireland" "Baby Food"               2018 1   10.941716 0 2 0 0 0 1 0 0
"UK"      "Baby Food"               2019 1    9.680828 0 2 0 0 0 0 1 0
"UK"      "Baby Food"               2019 0    90.31917 0 2 0 0 0 0 1 0
"Ireland" "Baby Food"               2019 0    89.15837 0 2 0 0 0 0 1 0
"Ireland" "Baby Food"               2019 1    10.84163 0 2 0 0 0 0 1 0
"Ireland" "Baby Food"               2020 0    89.23181 0 2 0 0 0 0 0 1
"UK"      "Baby Food"               2020 1     9.54126 0 2 0 0 0 0 0 1
"UK"      "Baby Food"               2020 0    90.45874 0 2 0 0 0 0 0 1
"Ireland" "Baby Food"               2020 1   10.768182 0 2 0 0 0 0 0 1
"Ireland" "Baked Goods"             2014 0    96.40752 0 3 0 0 0 0 0 0
"UK"      "Baked Goods"             2014 2 .0010265667 0 3 0 0 0 0 0 0
"UK"      "Baked Goods"             2014 0    93.20556 0 3 0 0 0 0 0 0
"Ireland" "Baked Goods"             2014 2 .0011187661 0 3 0 0 0 0 0 0
"UK"      "Baked Goods"             2014 1    6.793414 0 3 0 0 0 0 0 0
"Ireland" "Baked Goods"             2014 1     3.59136 0 3 0 0 0 0 0 0
"UK"      "Baked Goods"             2015 1    6.942422 0 3 1 0 0 0 0 0
"Ireland" "Baked Goods"             2015 2 .0011032855 0 3 1 0 0 0 0 0
"UK"      "Baked Goods"             2015 0    93.05656 0 3 1 0 0 0 0 0
"Ireland" "Baked Goods"             2015 1   3.5733924 0 3 1 0 0 0 0 0
"UK"      "Baked Goods"             2015 2   .00101654 0 3 1 0 0 0 0 0
"Ireland" "Baked Goods"             2015 0    96.42551 0 3 1 0 0 0 0 0
"UK"      "Baked Goods"             2016 1     7.13343 0 3 0 1 0 0 0 0
"Ireland" "Baked Goods"             2016 1    3.575801 0 3 0 1 0 0 0 0
"UK"      "Baked Goods"             2016 0    92.86558 0 3 0 1 0 0 0 0
"Ireland" "Baked Goods"             2016 0    96.42312 0 3 0 1 0 0 0 0
"UK"      "Baked Goods"             2016 2 .0009982894 0 3 0 1 0 0 0 0
"Ireland" "Baked Goods"             2016 2 .0010741096 0 3 0 1 0 0 0 0
"Ireland" "Baked Goods"             2017 1    3.590291 0 3 0 0 1 0 0 0
"UK"      "Baked Goods"             2017 1    7.161265 0 3 0 0 1 0 0 0
"Ireland" "Baked Goods"             2017 2 .0010554073 0 3 0 0 1 0 0 0
"Ireland" "Baked Goods"             2017 0    96.40865 0 3 0 0 1 0 0 0
"UK"      "Baked Goods"             2017 0    92.83774 0 3 0 0 1 0 0 0
"UK"      "Baked Goods"             2017 2 .0009987949 0 3 0 0 1 0 0 0
"UK"      "Baked Goods"             2018 2  .001007963 0 3 0 0 0 1 0 0
"UK"      "Baked Goods"             2018 0    92.77442 0 3 0 0 0 1 0 0
"Ireland" "Baked Goods"             2018 2 .0010359032 0 3 0 0 0 1 0 0
"UK"      "Baked Goods"             2018 1    7.224566 0 3 0 0 0 1 0 0
"Ireland" "Baked Goods"             2018 0     96.3924 0 3 0 0 0 1 0 0
"Ireland" "Baked Goods"             2018 1   3.6065545 0 3 0 0 0 1 0 0
"UK"      "Baked Goods"             2019 0    92.65563 0 3 0 0 0 0 1 0
"Ireland" "Baked Goods"             2019 0    96.44029 0 3 0 0 0 0 1 0
"Ireland" "Baked Goods"             2019 1    3.558706 0 3 0 0 0 0 1 0
"UK"      "Baked Goods"             2019 2 .0010218396 0 3 0 0 0 0 1 0
"Ireland" "Baked Goods"             2019 2 .0010000111 0 3 0 0 0 0 1 0
"UK"      "Baked Goods"             2019 1    7.343345 0 3 0 0 0 0 1 0
"UK"      "Baked Goods"             2020 1    7.259027 0 3 0 0 0 0 0 1
"UK"      "Baked Goods"             2020 2 .0010701094 0 3 0 0 0 0 0 1
"UK"      "Baked Goods"             2020 0    92.73991 0 3 0 0 0 0 0 1
"Ireland" "Baked Goods"             2020 2 .0010402437 0 3 0 0 0 0 0 1
"Ireland" "Baked Goods"             2020 0    96.34309 0 3 0 0 0 0 0 1
"Ireland" "Baked Goods"             2020 1    3.655866 0 3 0 0 0 0 0 1
"Ireland" "Bottled Water"           2014 1   .24856895 0 4 0 0 0 0 0 0
"UK"      "Bottled Water"           2014 2 .0021144135 0 4 0 0 0 0 0 0
"Ireland" "Bottled Water"           2014 0    99.74924 0 4 0 0 0 0 0 0
"UK"      "Bottled Water"           2014 0    99.75207 0 4 0 0 0 0 0 0
"Ireland" "Bottled Water"           2014 2 .0021854744 0 4 0 0 0 0 0 0
"UK"      "Bottled Water"           2014 1    .2458168 0 4 0 0 0 0 0 0
"UK"      "Bottled Water"           2015 0    99.73034 0 4 1 0 0 0 0 0
"UK"      "Bottled Water"           2015 2 .0022652892 0 4 1 0 0 0 0 0
"UK"      "Bottled Water"           2015 1    .2673995 0 4 1 0 0 0 0 0
"Ireland" "Bottled Water"           2015 0    99.75259 0 4 1 0 0 0 0 0
"Ireland" "Bottled Water"           2015 2    .0021185 0 4 1 0 0 0 0 0
"Ireland" "Bottled Water"           2015 1       .2453 0 4 1 0 0 0 0 0
"Ireland" "Bottled Water"           2016 2 .0020976719 0 4 0 1 0 0 0 0
"UK"      "Bottled Water"           2016 0    99.71962 0 4 0 1 0 0 0 0
"UK"      "Bottled Water"           2016 2 .0023132558 0 4 0 1 0 0 0 0
"UK"      "Bottled Water"           2016 1   .27807018 0 4 0 1 0 0 0 0
"Ireland" "Bottled Water"           2016 1    .2415012 0 4 0 1 0 0 0 0
"Ireland" "Bottled Water"           2016 0     99.7564 0 4 0 1 0 0 0 0
"UK"      "Bottled Water"           2017 1   .29721242 0 4 0 0 1 0 0 0
"UK"      "Bottled Water"           2017 0    99.70034 0 4 0 0 1 0 0 0
"Ireland" "Bottled Water"           2017 0    99.76518 0 4 0 0 1 0 0 0
"Ireland" "Bottled Water"           2017 2 .0020299621 0 4 0 0 1 0 0 0
"UK"      "Bottled Water"           2017 2  .002451064 0 4 0 0 1 0 0 0
end
label values type Type
label def Type 0 "Control", modify
label def Type 1 "Sugars", modify
label def Type 2 "Artificial sugars", modify
label values id_test5 id_test5
label def id_test5 1 "Asian Speciality Drinks", modify
label def id_test5 2 "Baby Food", modify
label def id_test5 3 "Baked Goods", modify
label def id_test5 4 "Bottled Water", modify

Collapsing multiple variables using the count option

Hi all,

I have a dataset where every row represents a study. Each row contains a column describing the region and country where the study was conducted, and several columns containing data on 5-year age-groups (from 0-4 to 99+). For each study, age-group variables are coded as 0 (the study sample does not contain that age group), 1 (the study sample contains that age group), or . (missing). The number of studies per country varies in the dataset (eg There are 8 studies done in China, 22 in the US, etc.) I am interested in determining what is the count of valid data (ie coded=1) for each age-group by country and region. This is an example of my dataset:

Code:

clear
input str71 sid_final str28 regionsimplified str22 country byte(age_20_24 age_25_29 age_30_34)
"HBSC 2009-2010" "Asia" "Armenia"    0 0 0
"World Health Survey" "Asia" "Bangladesh" 1 1 1
"Haq et al., 2005" "Asia" "Bangladesh" 1 1 1
"Cambodia Elderly survey 2004" "Asia" "Cambodia" 0 0 0
"Koyanagi et al., 2018" "Asia" "China" 0 0 0
"Liao et al., 2009" "Asia" "China" 1 1 1
"World Health Survey" "Asia" "China" 1 1 1
end

In the example given, I would expect to end up with
Bangladesh: 20-24: 2 studies; 25-29: 2 studies; 30-34: 2 studies
China: 20-24: 2 studies; 25-29: 2 studies; 30-34: 2 studies

I have tried to use the following command:

Code:

 preserve
collapse (first) regionsimplified (count) age_20_24 age_25_29 age_30_34, by(country)

but this gives me the total number of studies per country instead and the same number across all age groups, not necessarily the number of studies that have data for a particular age group per country. Any suggestions?
Thanks!

Retaining the entire ID if there are certain variables in the panel data

Good Morning world

English is not my language so google help me. Please understand me.

I have stata 15.

I`m using panel data with 191 variables and this data has been investigated for about 22years

ID goes from 101 to 102, 103...... like this.

wave is time variable

Please check the attached file for the data shape. Array

Here`s my question.

If var1 has 2 or 3, I'd like to keep all the IDs.

If var1 doesn't have 2 or 3, I want to get rid of all of that ID.

So I wrote down like

Code:

gen nid=ID if var1 == 2 | var1 == 3
gen nnid =1 if nid < .

I don't know what kind of logic to proceed from now on.

Using loop(ex.forvalues) and "keep if" don't seem to work properly.

Code:

gen nid=ID if var1 == 2 | var1 == 3
gen nnid =1 if nid < .
keep if nnid == 1

Code:

gen nid=ID if var1 == 2 | var1 == 3
gen nnid =1 if nid < .
forvalues wave = 1/22{
    keep if nnid == 1
    }

It`s not work and I know why this code doesn`t bring me to answer

But I don`t know how do i solve this case.

Please give me an advice on what logic to proceed with.

I hope this article followed the Advice on posting to Statalist.

Thankyou

Replacing missing values with draws from a subset of nonmissing values

I am replicating an imputation procedure for a household survey, and I'm struggling to find a elegant coding solution for bracketed responses. Some respondents do not provide an exact dollar amount for their income, but they do provide a bracketed response.

In the stylized example below, the wages_bkt and retirement_bkt are the bracketed versions of wages and retirement, respectively. A bracket code of 1 corresponds to a $0 to $1,000 bracket range and 2 to over $1,000. I've already calculated bracket codes for the nonmissing responses.

For the observations missing values, I need to take a random draw from the valid values that fall within the bracket range specified for that observation and income type, e.g. for the household with $1,500 in wages and missing retirement income, I need to randomly draw a retirment income value from the valid retirement income reports that fall in bracket 2.

In my scratch code below, I:

Rank the nonmissing values within each bracket range (I eventually plan to generate a random value and calculate within bracket rank by that number)
Generate variables with the total number of valid responses from which we can draw for wages and retirement
Draw a random number between 1 and the (bracket-level) number of valid reports, for observations with missing reports

I get the results at the bottom of the code. For the observation missing both wages and retirement, I want to pull the 2nd ranked wage value from the 2nd wage bracket, or $1,500. Ideally, I would do this without any file I/O (i.e. I'm hoping to avoid building a reference of valid values I merge back onto the observations missing data). I need to take 5 separate draws for each variable across ~20 variables (and store the data in a wide format). Further, I need to use a rolling 2-year sample across many quarters. I'm concerned that a solution relying on a lot of file I/O might prove quite inefficient.

Does anyone have any ideas how I might pull the valid values from the observations identified by the *_pull_from and *_bkt variables into the observations with invalid values in the example below? Or does anyone have an altogether different solution? I'm replicating an existing imputation procedure, so I'm constrained on the methodology. Thank you!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(wages retirement wages_bkt retirement_bkt)
 500 1100 1 2
 900  300 1 1
3000  600 2 1
2600  800 2 1
4600 1100 2 2
 700  400 1 1
 400 1100 1 2
1200 1700 2 2
1500    . 2 2
   .    . 2 1
   . 1200 1 2
end

bysort wages_bkt: egen w_rank = rank(wages) if !missing(wages), unique
bysort retirement_bkt: egen r_rank = rank(retirement) if !missing(retirement), unique

bysort wages_bkt: egen w_num_valid = max(w_rank)
bysort retirement_bkt: egen r_num_valid = max(r_rank)

set seed 123
gen w_pull_from = floor(runiform()*w_num_valid) + 1 if missing(wages)
gen r_pull_from = floor(runiform()*r_num_valid) + 1 if missing(retirement)

*Results in:
clear
input int(wages retirement wages_bkt retirement_bkt w_rank r_rank w_num_valid r_num_valid w_pull_from r_pull_from)
3000  600 2 1 4 3 5 4 . .
2600  800 2 1 3 4 5 4 . .
   .    . 2 1 . . 5 4 2 4
 900  300 1 1 4 1 4 4 . .
 700  400 1 1 3 2 4 4 . .
1200 1700 2 2 1 5 5 5 . .
1500    . 2 2 2 . 5 5 . 4
 400 1100 1 2 1 1 4 5 . .
 500 1100 1 2 2 3 4 5 . .
   . 1200 1 2 . 4 4 5 3 .
4600 1100 2 2 5 2 5 5 . .
end