Hello everyone,

I'm writing my thesis and I'm struggling with the processing of my data. First of all, my research question is: "What is the effect of environmental controversies on the profitability of Chinese and European firms?" and I want to check for moderation of corporate environmental performance, press freedom of the country of origin of the firm and ownership structure (concentration and state ownership). My dependent variables are ROA, ROE and Tobin's Q. My independent variables are environmental controversies (EC), corporate environmental performance (CEP), press freedom (PF), ownership concentration (Independence), and state ownership (GUO). My control variables are firm size, leverage and industry.

I have collected my data from Eikon and Orbis. I opted for a balanced dataset (so there are no more missing values), and this dataset consists of 314 firms (64 Chinese, 250 European)
My variables are:
- id (1 until 314)
- Year (2013-2018)
- Country (Europe or China)
- Industry (10 categories)
- Independence (A+ until D)
- GUO (e.g. Public authority)
- EC (dummy --> 0: no controversy in that year; 1: controversy in that year)
- CEP (score out of 100)
- PF (score out of 100)
- ROA
- ROE
- Tobin's Q
- Firm size
- Leverage

I made dummy variables for Country (DummyChina and DummyEurope), BvDIndependenceIndicator (DummyLowConcentration, DummyMediumLowConcentration, DummyMediumHighConcentration and DummyHighConcentration), GUO Type (DummyStateOwnership), Industry (DummyIndustry1, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10).

Also, the variables EC, CEP and PF are lagged, as I want to measure the effect of the occurence of an environmental controversy on the profitability of the next year.

When I first started my regression, I used SPSS. However, I read that Stata is a much better alternative for panel data. I was able to upload my data in Stata, and did some tests to check whether I need: pooled OLS model, fixed effects model or random effects model. The result pointed out that I need to use REM. I was able to regress my first model, only using ROA as my dependent variable and EC, Firm size, leverage, DummyChina, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10.

My questions:
- If I want to compare Chinese and European firms, is this the right standard model? Or do I have to start with just ROA, EC and the control variables and then make interaction terms for Country and Industry?
- If I later make interaction terms for Country, Industry, CEP, PF, GUO and Independence, can I add all these in just one regression? I do I have to add them separately and make multiple regressions?

Quite frankly, I'm a bit lost. I have never used panel data or Stata, and I have no idea what the right order is to answer my research question and check for moderation. My main struggle is the interaction terms.

If anyone has suggestions or could tell me the steps I have to follow, please let me know. Thank you in advance!!