Dear Statalisters,

I would love it if someone could help me solve the issue below. I try to give as much information as possible and proceed step by step, but please do let me know if there is something else that I should be providing (it is my first post!).

First, I am trying to generate a graph that has in the y-axis the attendance (in percentages) to various training sessions, and in the x-axis the training sessions themselves (all trainings, T1, T2, T3, T4). I do this using the code below:

twoway connected avg_percentattendedt T if T==0 || rcap hi_percentattendedt lo_percentattendedt T if T==0 /// * all trainings
|| connected avg_percentattendedt T if T==1 || rcap hi_percentattendedt lo_percentattendedt T if T==1 /// * T1
|| connected avg_percentattendedt T if T==2 || rcap hi_percentattendedt lo_percentattendedt T if T==2 /// *T2
|| connected avg_percentattendedt T if T==3 || rcap hi_percentattendedt lo_percentattendedt T if T==3 /// *T3
|| connected avg_percentattendedt T if T==4 || rcap hi_percentattendedt lo_percentattendedt T if T==4 /// *T4
, legend(order( 1 "All T mean" 2 "All T hi/low" 3 "T1 mean" 4 "T1 hi/low" 5 "T2 mean" 6 "T2 hi/low" 7 "T3 mean" 8 "T3 hi/low" 9 "T4 mean" 10 "T4 hi/low") pos(6) rows(5)) xlab(0 "All" 1 "T1" 2 "T2" 3 "T3" 4 "T4") ///
ytitle("%", height(10)) ylabel(55(5)80) xtitle("Treatment")



However, the attendants to the training sessions can be of 3 different types (say mg_level 1, mg_level 2, mg_level 3). I would like to reproduce the same graph as above with the distinction that for each point in the x-axis (i.e. each training) I would like the mean and variation for the three groups.

The data is initially in wide format and I have the percentage attendance variables without making distinction across groups. I proceed to create the variables by managerial level with the code below. In the code, I also collapse the data and reshape to long format as to end up with a dataset consisting of three observations (one for each managerial level), and variables "T avg_percentattendedt0 hi_percentattendedt0 lo_percentattendedt0 avg_percentattendedt1 hi_percentattendedt1 lo_percentattendedt1 avg_percentattendedt2 hi_percentattendedt2 lo_percentattendedt2 avg_percentattendedt3 hi_percentattendedt3 lo_percentattendedt3 avg_percentattendedt4 hi_percentattendedt4 lo_percentattendedt4". T is equal to 1,2,3 for obs 1, 2, and 3 respectively, and distinguishes between the groups.

global Var percentattendedt0 percentattendedt1 percentattendedt2 percentattendedt3 percentattendedt4

foreach y of varlist $Var {

forval i = 1/3 {

if `i' == 1 {

su `y' if keyattendant == 1 & mg_level == `i'
scalar mean_`y'`i' = r(mean)
scalar n_`y'`i' = r(N)
scalar sd_`y'`i' = r(sd)

egen avg_`y'`i' = mean(`y') if keyattendant == 1 & mg_level == `i'

gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))

}

if `i' == 2 {

su `y' if keyattendant == 1 & mg_level == `i'
scalar mean_`y'`i' = r(mean)
scalar n_`y'`i' = r(N)
scalar sd_`y'`i' = r(sd)

egen avg_`y'`i' = mean(`y') if keyattendant == 1 & mg_level == `i'

gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))

}

if `i' == 3 {

su `y' if mg_level == `i'
scalar mean_`y'`i' = r(mean)
scalar n_`y'`i' = r(N)
scalar sd_`y'`i' = r(sd)

egen avg_`y'`i' = mean(`y') if mg_level == `i'

gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
}
}
}


collapse (mean) avg_percentattendedt01 hi_percentattendedt01 lo_percentattendedt01 ///
avg_percentattendedt02 hi_percentattendedt02 lo_percentattendedt02 ///
avg_percentattendedt03 hi_percentattendedt03 lo_percentattendedt03 ///
avg_percentattendedt11 hi_percentattendedt11 lo_percentattendedt11 ///
avg_percentattendedt12 hi_percentattendedt12 lo_percentattendedt12 ///
avg_percentattendedt13 hi_percentattendedt13 lo_percentattendedt13 ///
avg_percentattendedt21 hi_percentattendedt21 lo_percentattendedt21 ///
avg_percentattendedt22 hi_percentattendedt22 lo_percentattendedt22 ///
avg_percentattendedt23 hi_percentattendedt23 lo_percentattendedt23 ///
avg_percentattendedt31 hi_percentattendedt31 lo_percentattendedt31 ///
avg_percentattendedt32 hi_percentattendedt32 lo_percentattendedt32 ///
avg_percentattendedt33 hi_percentattendedt33 lo_percentattendedt33 ///
avg_percentattendedt41 hi_percentattendedt41 lo_percentattendedt41 ///
avg_percentattendedt42 hi_percentattendedt42 lo_percentattendedt42 ///
avg_percentattendedt43 hi_percentattendedt43 lo_percentattendedt43

gen A = 1

reshape long avg_percentattendedt0 avg_percentattendedt1 avg_percentattendedt2 avg_percentattendedt3 avg_percentattendedt4 ///
hi_percentattendedt0 hi_percentattendedt1 hi_percentattendedt2 hi_percentattendedt3 hi_percentattendedt4 ///
lo_percentattendedt0 lo_percentattendedt1 lo_percentattendedt2 lo_percentattendedt3 lo_percentattendedt4, i(A) j(T)



My best attempt to create the graph I need has taken me as far as this (see below). Unless I have misunderstood, the twoway command does not admit the over option, which I think is a main reason why I am getting stuck.

twoway connected avg_percentattendedt0 T || rcap hi_percentattendedt0 lo_percentattendedt0 T ///
|| connected avg_percentattendedt1 T || rcap hi_percentattendedt1 lo_percentattendedt1 T ///
|| connected avg_percentattendedt2 T || rcap hi_percentattendedt2 lo_percentattendedt2 T ///
|| connected avg_percentattendedt3 T || rcap hi_percentattendedt3 lo_percentattendedt3 T ///
|| connected avg_percentattendedt4 T || rcap hi_percentattendedt4 lo_percentattendedt4 T
, legend(order( 1 "All T mean" 2 "All T hi/low" 3 "T1 mean" 4 "T1 hi/low" 5 "T2 mean" 6 "T2 hi/low" 7 "T3 mean" 8 "T3 hi/low" 9 "T4 mean" 10 "T4 hi/low") pos(6) rows(5)) xlab(0 "All" 1 "T1" 2 "T2" 3 "T3" 4 "T4") ///
ytitle("%", height(10)) ylabel(55(5)80) xtitle("Treatment")