Hello everyone,

I have a question that I hope you guys can help me. Below is an example of a panel data set
id years of schooling job1_occ_code job2_occ_code year
1 16 0010 2121 2012
1 16 0010 . 2013
1 16 2200 0010 2014
2 17 2200 1313 2012
2 17 2200 . 2013
2 17 1313 2121 2014
3 22 2121 2200 2012
3 22 2121 0010 2013
The above table is a survey that allows respondents to report up to 2 jobs (i.e., job1 and job2). What I am looking for is to calculate the average years of schooling for each ID accordingly to the each occupation codes (i.e., job1_occ_code and job2_occ_code). If I used "egen meanschooling=mean(schooling), by (job1_occ_code)", the outcome would be based on the frequency of the codes for job 1 only. For example, for job1_occ_code=="2200", it appears 3 times leading to the average = (17+16+16)/3=16.7 years. Similarly, if I used "egen meanschooling=mean(schooling), by (job2_occ_code)", the average years of schooling for job code 2200 is 22.

However, what I am looking for is to calculate the average years of education, for each code regardless of whether it is under job 1 or job 2. For example, for job code 2200, the average years of schooling that I am looking for is (16+17+17+22)/4=18. With this sample dataset, it could be done easily by "egen meanschooling=mean(schooling) if job1_occ_code=="2200" | job2_occ_code=="2200"", and Stata would return the mean years of schooling by 18. Nevertheless, my actual dataset allows respondents to report up to 7 jobs, and there are hundreds of occupational codes. It means that it is not as simple as in this case. Does anyone know how to calculate the average years of education across each occupational code? Any helps would be much appreciated.

Thank you in advance, and stay safe.