I have an (unbalacend) panel data set of German companies with investments in multiple countries over ten years and want to analyze the effect of certain country-level variables on the investment amount of German firms. The following gives you an example of the structure of the data.
Please note that it is just example data. Investor_id refers to the ID of the German investor, invest_amount is the invested amount held by the German company in that country, GDP refers to the gross domestic product of the country in which this investment is seated, GDP_capita is the GDP per capita and EU_dummy is a dummy for the country being part of the EU (=1) or not (=0). My actual data set contains 8 more country-level variables.
Code:
.list, sepby(year) noobs abbrev(16) +---------------------------------------------------------------------------------+ | year country investor_id invest_amount GDP GDP_capita EU_dummy | |---------------------------------------------------------------------------------| | 2017 China 45 1300 1.00e+07 4320 0 | | 2017 France 45 100 400000 5675 1 | | 2017 France 86 670 400000 5675 1 | |---------------------------------------------------------------------------------| | 2018 China 45 1500 1.10e+07 4520 0 | | 2018 France 45 105 390000 5575 1 | | 2018 France 86 660 390000 5575 1 | +---------------------------------------------------------------------------------+
Code:
xtset investor_id year
Code:
xtset investor_id xtreg log_invest_amount GDP GDP_capita EU_dummy i.year, fe
Here are the alternatives to the above fixed effects for investor_id and year that I could also use and that I encountered in the literature:
1) Create a new country-investor-level variabel
Code:
egen ic_id = group(investor_id country) xtset ic_id year xtreg log_invest_amount GDP GDP_capita EU_dummy, fe
Code:
egen it_id = group(investor_id year) xtset it_id xtreg log_invest_amount GDP GDP_capita EU_dummy, fe
Code:
egen ct_id = group(country year) xtset ct_id xtreg log_invest_amount GDP GDP_capita EU_dummy, fe
Furthermore, I believe that the standard errors have to be made cluster robust. Here I encounter the exact same problem as with the fixed effects and don't know what to define as the group that should be clustered after (is it the investors? or the countries? or the investors in a given country?). And in the literature it seems again very ambigious what to use and most often it comes without an explanation.
So my second question is what is your thought on which cluster variable to use? Country, investor_id, ic_id, it_id or ct_id from above?
I am looking forward to your helpful and interesting input
For the sake of completeness I want to mention that I posted a somewhat related question in 2017 on Statalist when I was for the first time working on a related project, but it was a different approach/model and question and my Stata and statistics knowledge was on a different level back then: https://www.statalist.org/forums/for...led-panel-data
Best regards,
Anton
0 Response to What fixed effects and cluster group to use on "three dimensional" panel data?
Post a Comment