Hi All,

I compiled an unbalanced panel dataset, which includes six years from 2010 to 2015.
After getting the balanced panel, I want to do a panel regression at the firm level. Firm_id is a firm’s identification number, branch_id is each branch’s identification number. ROA is return on asset, which was measured at firm level. Year and branch_id can uniquely identify each observation. Year and firm_id cannot. In 2010: 1104 observations; 2011: 754 observations; 2012: 582 observations; 2013: 444 observations; 2014: 302 observations; 2015: 208 observations.
I want to convert this unbalanced panel data to balanced panel data. That is, I want to keep the firms who consistently appear from year 2010 to 2015 and drop those firms who do not consistently appear throughout the six years. One issue is year and firm_id cannot uniquely identify each observation. Only year and branch_id can do so, which makes this task even more complex. I am not sure how to do it using Stata. Please advise.

Below is the sample data.
Code:
 * Example generated by -dataex-. To install: ssc install dataex clear input int year str9 firm_id int branch_id byte roa   
2010 "E00000019" 111003308 0.053
2010 "E00000038" 131052542 0.464
2010 "E00000038" 139019291 0.464
2010 "E00000039" 131000521 0.464
2010 "E00008577" 135001204 0.055
2010 "E00008578" 101000449 0.055
2011 "E00000019" 139003503 0.075
2011 "E00000019" 122007047 0.075
2011 "E00000038" 131052542 -0.014
2011 "E00000038" 131000521 -0.014
2011 "E00000038" 139019291 -0.014
2011 "E00008577" 124006320 0.035
2012 "E00000019" 119001952 0.088
2012 "E00000019" 139003503 0.088
2012 "E00000038" 140008420 -0.002
2012 "E00008577" 124006320 0.038
2013 "E00000019" 139003503 0.036
2013 "E00008577" 124006320 0.067
2014 "E00000019" 139003503 0.049
2014 "E00000019" 111003308 0.049
2014 "E00008577" 124006320 0.049
2015 "E00000019" 111003308 -0.109
2015 "E00000019" 139003503 -0.109

  end
Many thanks,
David