Hello! First post here, will do my best to follow all FAQ rules.

I am working on a project with a dataset that has 162,000 observations and 52 variables. Each observation is firm results from a given year. Overall, I am seeking to determine effect of immigration in a given Norwegian municipality on individual firm performance.

variables of interest are:

imm_share : % of workforce in a given municipality in a given year that is classified as an immigrant
ROA: Return on Assets, Firm Profit divided by Firm Assets in a given year
aar: year dummy
industry: industry the firm operates dummy
log_ansatte: log of number of employees at a firm in a given year
log_firmage: log of firm age in a given year

the employees and firm age are meant to be proxies for firm size.

example of dataset:
. dataex ROA imm_share aar log_firmage log_salg

----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ROA imm_share) int aar float(log_firmage log_salg)
.0858681 .02696629 2001 3.0910425 9.262743
.04753989 .05016723 2001 1.94591 9.31722
.16474044 .036985237 2001 2.1972246 9.242129
.04280008 .04942902 2001 3.178054 9.332735
.06279306 .029482344 2001 4.204693 11.091865
.036365848 .031799663 2001 2.833213 11.284744

our estimation and results:

reg ROA imm_share i.aar i.industry log_firmage log_ans if e(sample),vce(cluster cid)


Array


MY QUESTION:
When we run this regress function, and use several variations of control variables, we are always getting 0.00 to 0.012 p value. We wouldnt expect this level of significance. Anyone have some steps to correct or a possible explanation? What would this result signify?

We are stumped as how to best explain this part of the results.

Thank you so much for any insight you can provide.