I am using Stata 14.2 on Windows. I have a question on how to use reghdfe for my hedonic analysis with fixed effects.
My data consists of around 200,000 observations on house sale transactions in 4 different cities between 2009 - 2018. Each house can be identified by the variable 'pc6nmbr_n' which is a concatenation of the postcode and housenumber, basically indicating the street where the house is located. It was originally a string variable, assigned values with the command:
Code:
egen long pc6nmbr_n = group(pc6nmbr)
(I think I did this correctly but if anyone already notices a mistake here I'd love to know about it of course).
Many houses are only sold once within this period, but it does occur that a house is sold more than once and so a certain value of 'pc6nmbr_n' might occur multiple times in the dataset.
'lnprice' is my dependent variable and there are 19 independent variables (house characteristics like number of rooms, construction year dummies, whether there is a garden etc.). The variable I am especially interested in is 'MIX2', which is a dummy that takes the value of 1 when the residential property sold is situated in a building which is used not only for residential purposes but also for example retail or something. I want to determine the effect of 'MIX2' on the house price.
Regarding the fixed effects, I want to control for time trends and location effects, so I believe I have to add 'year' (of sale) and 'pc6nmbr_n' as fixed effects.
With areg only 1 fixed effect can be added so I would have to create a variable like location_year = year * pc6nmbr_n , but I don't really understand what this new variable would indicate since pc6nmbr_n is just a random value. Or is it not something you can really interpret, only used to let Stata know to control for these effects?
Because of above I decided to try out reghdfe. It drops out singleton observations, but I don't want to omit observations with 'pc6nmbr_n' only occuring once? But more importantly, it omits 'MIX2', which is the whole reason of the analysis.. My guess is this is because it is time-invarient, but in what way can I still include it in the regression? And finally about the R², do I understand correctly it is the within R² that I should focus on?
Below I added an example of my dataset, excluding the extensive amount of other independent variables, and the results I get when using reghdfe.
I'm hoping someone could give me some tips or a push in the right direction, I am new to Stata so I can imagine I am overlooking some relatively easy things.
Kind regards,
Lonneke
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(lnprice lnsize MIX2) long pc6nmbr_n int year 12.167851 4.0430512 0 120585 2014 12.886642 4.744932 0 127367 2013 12.97154 4.828314 1 117672 2017 12.162643 4.060443 0 132290 2016 12.611538 4.990433 0 135702 2013 12.36734 4.7095304 0 116853 2010 11.74006 4.3307333 0 133956 2015 12.561647 4.5217886 0 125380 2015 12.449018 4.6151204 0 125883 2018 12.319402 4.744932 0 126996 2013 12.961366 4.927254 0 128339 2018 11.782952 3.6888795 0 122241 2011 11.95118 4.304065 0 133781 2009 12.506177 4.828314 0 129475 2016 12.63623 4.7004805 0 121168 2010 12.269048 4.2341065 0 117600 2015 13.5049 5.135798 0 137288 2018 11.652687 4.3820267 0 124573 2015 12.15478 4.158883 0 121395 2015 12.12811 4.477337 1 136561 2014 12.043553 4.317488 0 130981 2013 12.05525 4.0073333 0 125341 2010 12.303653 4.248495 0 126274 2015 12.487485 4.624973 0 122260 2013 12.75852 4.867535 0 135616 2010 12.220962 4.4426513 0 130362 2009 11.9544 4.0943446 0 120110 2009 12.94801 4.941642 0 136616 2010 12.6082 4.4998097 0 113603 2013 11.73607 4.543295 0 127401 2015 12.936034 4.744932 0 119048 2011 12.841326 5.105946 0 129439 2016 11.73607 3.912023 0 135457 2012 12.75708 4.787492 0 127093 2009 11.8706 4.543295 0 134478 2016 12.26198 4.65396 0 124290 2017 11.901584 4.248495 0 131795 2014 12.821259 5.075174 0 121787 2011 12.706848 4.787492 0 137248 2010 12.89922 4.5325994 0 117529 2016 12.84819 4.744932 0 122322 2015 12.356646 4.4308167 0 122305 2009 12.524527 4.787492 0 129779 2014 12.542545 4.4998097 0 118949 2012 12.07254 4.276666 0 121105 2017 13.226724 4.983607 0 137661 2015 12.668233 4.787492 0 130543 2010 12.542545 4.787492 0 117130 2010 12.141534 4.4426513 0 127707 2009 12.05525 4.1743875 0 127198 2013 12.254863 4.6051702 0 131829 2016 12.25009 4.189655 0 139126 2011 12.058152 4.1743875 0 125682 2015 11.77529 3.496508 0 120487 2016 12.798018 4.6051702 0 117419 2017 11.97666 4.26268 0 124839 2016 12.717698 4.4886365 0 120927 2016 12.686954 5.075174 0 128242 2010 11.849398 4.4188404 0 115967 2014 12.149503 4.26268 0 131550 2015 11.759786 4.0943446 0 125666 2015 11.652687 3.3322046 0 136795 2015 12.59473 4.976734 0 114347 2015 12.19096 4.6051702 0 131337 2016 12.409014 4.7004805 0 130810 2017 12.89922 5.398163 1 119644 2014 12.06681 4.317488 0 132404 2015 11.867097 3.8066626 0 125846 2013 11.91839 4.624973 0 132963 2012 11.759786 4.3820267 0 123839 2012 11.95118 4.189655 0 122118 2014 12.317166 4.6051702 0 132386 2010 12.043553 4.4998097 0 122743 2009 12.36734 4.634729 0 132915 2014 12.502466 4.5217886 0 132502 2018 12.266697 4.248495 0 117590 2010 12.827992 4.744932 0 127290 2014 12.259613 3.7612 0 120519 2018 12.623137 4.691348 0 127932 2018 12.016726 4.6051702 0 134348 2010 11.95118 3.465736 0 132030 2012 11.95761 4.5217886 0 134537 2011 13.442997 5.225747 0 127838 2018 13.190022 4.890349 0 122341 2018 12.137794 4.2904596 0 124469 2017 12.193494 4.6634393 0 133067 2014 12.058152 4.0073333 0 131780 2016 11.674193 4.317488 0 123975 2014 11.695247 4.4998097 0 134776 2014 12.487485 4.6728287 0 140081 2017 12.64594 4.94876 0 129299 2015 12.468437 4.919981 0 129213 2014 12.577636 4.804021 0 126778 2014 12.206073 4.624973 0 123956 2017 12.058152 4.204693 0 124524 2017 12.985686 4.770685 0 120176 2017 12.76581 4.770685 0 124363 2017 12.036174 4.1108737 0 139956 2014 12.452932 4.6728287 0 121736 2017 12.05525 4.248495 0 126871 2015 end
0 Response to Fixed effects using reghdfe
Post a Comment