Collapse command, OLS regression, and weighing per group

Hello everyone,

I am analyzing a large dataset with STATA, and am utilizing OLS regression to try to understand how the median age of the employers in an industry and the median number of workers per firm in an industry affect the percentage of employees in the workforce.

I started with a longitudinal dataset with over 20million rows, and after collapsing by year and industry_ID obtained a dataset that looks like the following, with Industry_ID representing the unique id of each industry, age_median representing the median age of the workers in the industry, per year, nemp_median representing the median number of workers in the firms of each industry, per year, and Percentage_employees_per_industry representing the ration of employees per total workers in an industry, per year (this is just a part of the whole dataset obtained, as I have 107 different industry IDs, but it looks as such)

Industry_ID	year	age_median	nemp_median	Percentage_employees_per_industry
1	2007	43	7	6.1584667
1	2008	43	7	6.3488696
1	2009	43	7	6.6453313
1	2010	43	8	6.0971645
1	2011	43	8	6.8209944
1	2012	43	8	7.2148137
1	2013	43	9	7.6716896
1	2014	43	8	7.7114815
1	2015	42	9	7.9195938
1	2016	42	10	7.8262298
1	2017	43	10	7.8576314
1	2018	42	10	7.3446328
1	2019	41	12	6.9016757
2	2007	39	7	10.932619
2	2008	39	7	11.627907
2	2009	40	7	11.778952
2	2010	40	7	9.8929845
2	2011	40	7	10.824859
2	2012	41	7	10.758377
2	2013	41	8	10.984848
2	2014	41	8	11.038062
2	2015	42	8	10.876434
2	2016	43	8	10.933797
2	2017	43	8	11.466373
2	2018	43	8	11.333044
2	2019	43	8	11.588974
3	2007	45	13	7.722245
3	2008	45	14	8.2989884
3	2009	46	12	9.1343025
3	2010	47	14	6.7039106
3	2011	47	14	5.7249712
3	2012	47	14	6.2974417
3	2013	47	14	6.9543705
3	2014	47	11	7.165838
3	2015	47	14	7.0180229
3	2016	48	13	6.8453171
3	2017	47.5	13	7.038961
3	2018	47	14	5.8881016
3	2019	46	15	5.5490517

The line of code I am using to run the OLS is: reg Percentage_employees_per_industry age_median nemp_median i.year

All works fine, no issues here, all coefficients are significative, and no problems arose.

My question is related to the process that collapse does to achieve the dataset presented above, and if I need to do any weighing of the data in order to analyze it - the industries (given by Industry_ID) are very heterogeneousin the amount of workers they have, and the amount of employees they have: Industry 1 has 2001 workers in year 2007, 2095 workers in year 2008 and 2118 workers in year 2009; Industry 2 has 170 workers in year 2007, 171 workers in year 2008 and 166 workers in year 2009; Industry 3 has 11021 workers in year 2007, 12111 workers in year 2008 and 14206 workers in year 2009. Given this, I was thinking that doing this OLS, just as it is, is not exactly correct - what I am trying to ascertain is if the age of the workers in an industry and if the amount of workers per firm affects the percentages of employers in the workforce. But, given the heregoneity of the industries, should I not consider assigning some form of weighing to this regression? Because, as it is, each of the values of Percentage_employees_per_industry has the impact in the OLS regression as all the others. But, in fact, some of the industries account for nearly 10% of the amount of total workers (and employers) in the workforce, while some others have less than 0.01% of the amount of workers and employers. The collapse command, however, does not take this into account, and provides one line of result for industry 1, year 2007 with one value of Percentage_employers_per_industry, and does the exact same for industry 3, even though industry 3 has more than 10 times as many workers and employers.

After this long explanation, what is it I should do? Are there any method of assigning weights to this procedure? Is this even necessary, or am I misinterpreting it?

Thank you very much, any help will be much appreciated!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Collapse command, OLS regression, and weighing per group
Collapse command, OLS regression, and weighing per group

0 Response to Collapse command, OLS regression, and weighing per group

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Collapse command, OLS regression, and weighing per group Collapse command, OLS regression, and weighing per group

Related Posts with Collapse command, OLS regression, and weighing per group

0 Response to Collapse command, OLS regression, and weighing per group

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Collapse command, OLS regression, and weighing per group
Collapse command, OLS regression, and weighing per group