Hello,

I hope you are all doing fine!

I have a panel dataset with over a thousand firm IDs and 397 variables. For each ID I have data on time periods ranging from 2003-2010, although each firm ID underlies different time periods (2003-2006, 2004-2008, etc.).

Because I would like to run a linear regression using as the dependent variable the returns (_roe) of two subgroups (given by the variable profit_status: non-profit or profit), I need to create a for-profit and a non-profit return portfolio first. Therefore, I need to find the weighted average return of each of the two portfolios per year and then obtain the mean value for each subgroup.

Finally, I would like to implement the same process by region.

Here is an extract of my dataset:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long mfiid int year str10 profit_status str31 region double loanp_gr float _roe
100000 2009 "Non-profit" "South Asia"                        6.989898   .14123334
100000 2008 "Non-profit" "South Asia"                        5.815762   .17713334
100000 2007 "Non-profit" "South Asia"                        4.447335    .2913083
100001 2007 "Non-profit" "Eastern Europe and Central Asia"  32.509157   .21060835
100001 2009 "Non-profit" "Eastern Europe and Central Asia"  43.950894  .015933335
100001 2004 "Non-profit" "Eastern Europe and Central Asia"   3.823308   .04955833
100001 2010 "Non-profit" "Eastern Europe and Central Asia"  34.686921  .017058332
100001 2006 "Non-profit" "Eastern Europe and Central Asia"    9.54903   .02328333
100001 2005 "Non-profit" "Eastern Europe and Central Asia"   5.243011 -.008699998
100004 2008 "Profit"     "Africa"                          318.429249   .20683333
100008 2007 "Non-profit" "South Asia"                        2.514016   -.4197917
100008 2008 "Non-profit" "South Asia"                        3.528543  -2.9300666
100008 2009 "Non-profit" "South Asia"                        3.126599    9.721633
100008 2006 "Non-profit" "South Asia"                         .925905   -.4828166
100012 2010 "Non-profit" "South Asia"                       17.238102   .16185834
100012 2007 "Non-profit" "South Asia"                        6.380352   1.6969082
100012 2009 "Non-profit" "South Asia"                       12.460604    .9820334
100012 2005 "Non-profit" "South Asia"                        1.188292       .3659
100012 2008 "Non-profit" "South Asia"                        7.094487   1.2119334
100012 2006 "Non-profit" "South Asia"                        2.859306   -.4431167
100013 2010 "Non-profit" "South Asia"                        1.343236  -.13484167
100013 2009 "Non-profit" "South Asia"                        1.247345  -.10906667
100016 2005 "Profit"     "South Asia"                        4.186943      -.0101
100016 2009 "Profit"     "South Asia"                       17.369746   .07153333
100016 2010 "Profit"     "South Asia"                       19.892395   .08925833
100016 2006 "Profit"     "South Asia"                        6.661947 .0035833344
100016 2008 "Profit"     "South Asia"                       12.535291   .05173333
100016 2007 "Profit"     "South Asia"                       12.494852  .016408335
100017 2006 "Profit"     "South Asia"                        91.68322   .09988333
100017 2003 "Profit"     "South Asia"                       18.902664      .17895
end
format %ty year
I have tried to do it by creating a variable for the weighted average returns and then extract the mean values of the summary statistics to an Excel file, but I wonder whether there is a better option just using Stata (as I am really new to the software).

I would appreciate your help.

Kind regards