Doubt on data structure/fixed effects

Dear Statalist users,

I could really use some advice on how to handle the data I have for a piece of research I am starting to work on. I am interested in trying to determine whether the Black Lives Matter protests (in particular violent ones) of last year had an impact on the November US presidential election.

Here is a small sample of the dataset I have:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int EVENT_DATE str20 state str27 county str37 LOCATION long(rainfall county_fips) float(viol_prot diff_dem_2016_2020)
    . "Alabama" "Autauga" ""             . 1001 0  3.061512
22205 "Alabama" "Baldwin" "Orange Beach" 0 1003 0 2.8437195
22101 "Alabama" "Baldwin" "Orange Beach" . 1003 1 2.8437195
22165 "Alabama" "Baldwin" "Orange Beach" . 1003 0 2.8437195
22149 "Alabama" "Baldwin" "Foley"        0 1003 0 2.8437195
22080 "Alabama" "Baldwin" "Fairhope"     . 1003 0 2.8437195
    . "Alabama" "Barbour" ""             . 1005 0 -.8720779
22079 "Alabama" "Bibb"    "Centreville"  . 1007 0 -.7237587
    . "Alabama" "Blount"  ""             . 1009 0 1.0994759
    . "Alabama" "Bullock" ""             . 1011 0 -.3884811
end
format %td EVENT_DATE

Although it is a bit hard to see I have data ranging from May 2020 up to election day on protests that occurred in the US, with dates (here in Stata format), state/county(also with county fips code)/location of protest, rainfall on day of protest (needed as instrumental variable), a dummy indicating whether the protest was violent or not and the difference in vote share of the Democratic party relatively to the previous election. (For simplicity, I have omitted a large number of variables.)

As you can see, there are repeated observations for some of the counties (i.e. more than one protest occurred in that county between May and November). But of course, the dependent variable, relative to the election results, is the same for each observation by county. So my question is whether this could be a problem for estimation purposes, meaning the fact that I have a constant dependent variable relative to multiple observations (protests happened more than once but the election took place only once, in November). Is the data structure ok?

Moreover, when trying to run simple OLS regressions (my intention is to use more sophisticated methods down the road), I notice that when including county fixed effects and time fixed effects (relatively to the date of the protest)

Code:

reg diff_dem_2016_2020 viol_prot i.EVENT_DATE i.county_fips

the IV is basically zero and the R sq. 1, meaning that the model is fully explained by the fixed effects. This also happens when including county fixed effects alone but not with time fixed effects alone. So, how could I include fixed effects in a way that works with my data? (Using xtset is not an option because of the "repeated observations within panel" error).

Any help would be greatly appreciated.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Doubt on data structure/fixed effects
Doubt on data structure/fixed effects

0 Response to Doubt on data structure/fixed effects

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Doubt on data structure/fixed effects Doubt on data structure/fixed effects

Related Posts with Doubt on data structure/fixed effects

0 Response to Doubt on data structure/fixed effects

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Doubt on data structure/fixed effects
Doubt on data structure/fixed effects