I am interested in estimating a relation between two variables across multiple subsamples but I am hung up by the fact that I also want to account for interactive effects within each subsample.
Here is an example of what I am talking about (the actual variables are different in my study):
Let's say that I want to examine the effect that fertilizer has on the height of a plant. I could estimate the simple regression:
(1) Height = a + b1*fertilizer + e
Where height is the height of the plant and fertilizer is the amount of fertilizer. b1 would then be the effect of fertilizer on height.
However, I actually have many different types of plants (say 30) and I expect this relation to vary across my plants. What I can do is estimate this regression separately for each type of plant and come up with a an estimate of b1 for each plant subsample.
HOWEVER, I also know that whether the plant is positioned in the sun will change the effect of fertilizer (Sun is a 0/1 variable). That is, Sun has an interactive effect on fertilizer. However, I'm not interested in the effect of sunlight and I would like to remove this variation from the data so that I can focus on comparing the effect of fertilizer between plant types. If I estimate
(2) Height = a + b1*fertilizer + b2*Sun + b3*fertilizer*Sun + e
separately for each of my 30 plant types, then comparing b1 for each subsample no longer tells me how the effect of fertilizer varies across plants. Instead, it allows me to test whether the effect of fertilizer varies across plant types ONLY for plants which are not planted in the sun. Similarly, if I compare b3 across my subsamples, I am comparing the effect of fertilizer ONLY for plants which are planted in the sun.
I feel I am in a bit of a pickle because I would like to compare the effect of fertilizer across plant types on average, removing the variation caused by sunlight. It would be inappropriate to simply estimate model (1) and compare b1, because if some of my plant subsamples have a higher proportion of plants planted in the sun, then differences in b1 would be driven by the omitted variable Sun and not just by differences in the effect of fertilizer across subsamples. In the actual setting that I am looking at, I have several of these interactive variables and many of them are continuous so I can't just do simple statistics where I compare fertilizer with and without sun separately.
I'm not sure if it's possible to actually do what I want to. I have tried searching for this (very specific) scenario online and have not found any solutions. I think the following may work but I don't know if it is appropriate:
First estimate: (3) Height = a + b1*Sun + e
take e from this regression and call it e_Height
then estimate: (4) Fertilizer = a + b1*Sun + e
take e from this regression and call it e_fertilizer
Now estimate: (5) e_Height = a + b1*e_fertilizer + e, separately for each plant subsample.
Would comparing b1 from this last equation (5) be appropriate? Would it allow me to compare the average effectiveness of fertilizer across different types of plants without being confounded by Sun? I am concerned that this may not solve the issue because Sun has an interactive effect and is therefore not just a simple additional control variable. Additionally, I don't know if I should estimate (3) and (4) within each plant subsample or for the overall sample (I guess it depends on whether I think the effect of Sun varies across subsamples as well).
Any thoughts and suggestions would be very helpful!
Related Posts with Comparing coefficient across subsamples in the presence of interactive effects
How to test correlation between two variables – Panel dataHi all! I wonder how it is possible to test for correlation between two variables (panel data). Wha…
statistical significanceHi, How check in stata statistical significance between price and country of origin from data stata.…
Add Confidence intervals to median splineI would like to add 95% confidence intervals to a median spline. It does not appear to be an option …
xtoprobit vs xtreg, feI want to run a fixed effects regression model in stata using panel data to examine the change in in…
Using maps to present incidence ratesHi everyone I have calculated incidence rates (per 100 000 person years) for disease X by regions i…
Subscribe to:
Post Comments (Atom)
0 Response to Comparing coefficient across subsamples in the presence of interactive effects
Post a Comment