Hi all,

I am using Stata 14.2 on Windows. I have a question on how to use reghdfe for my hedonic analysis with fixed effects.

My data consists of around 200,000 observations on house sale transactions in 4 different cities between 2009 - 2018. Each house can be identified by the variable 'pc6nmbr_n' which is a concatenation of the postcode and housenumber, basically indicating the street where the house is located. It was originally a string variable, assigned values with the command:

Code:
 egen long pc6nmbr_n = group(pc6nmbr) 
because there were to many values to use encode.
(I think I did this correctly but if anyone already notices a mistake here I'd love to know about it of course).

Many houses are only sold once within this period, but it does occur that a house is sold more than once and so a certain value of 'pc6nmbr_n' might occur multiple times in the dataset.

'lnprice' is my dependent variable and there are 19 independent variables (house characteristics like number of rooms, construction year dummies, whether there is a garden etc.). The variable I am especially interested in is 'MIX2', which is a dummy that takes the value of 1 when the residential property sold is situated in a building which is used not only for residential purposes but also for example retail or something. I want to determine the effect of 'MIX2' on the house price.

Regarding the fixed effects, I want to control for time trends and location effects, so I believe I have to add 'year' (of sale) and 'pc6nmbr_n' as fixed effects.

With areg only 1 fixed effect can be added so I would have to create a variable like location_year = year * pc6nmbr_n , but I don't really understand what this new variable would indicate since pc6nmbr_n is just a random value. Or is it not something you can really interpret, only used to let Stata know to control for these effects?

Because of above I decided to try out reghdfe. It drops out singleton observations, but I don't want to omit observations with 'pc6nmbr_n' only occuring once? But more importantly, it omits 'MIX2', which is the whole reason of the analysis.. My guess is this is because it is time-invarient, but in what way can I still include it in the regression? And finally about the R², do I understand correctly it is the within R² that I should focus on?

Below I added an example of my dataset, excluding the extensive amount of other independent variables, and the results I get when using reghdfe.

I'm hoping someone could give me some tips or a push in the right direction, I am new to Stata so I can imagine I am overlooking some relatively easy things.

Kind regards,
Lonneke




Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(lnprice lnsize MIX2) long pc6nmbr_n int year
12.167851 4.0430512 0 120585 2014
12.886642  4.744932 0 127367 2013
 12.97154  4.828314 1 117672 2017
12.162643  4.060443 0 132290 2016
12.611538  4.990433 0 135702 2013
 12.36734 4.7095304 0 116853 2010
 11.74006 4.3307333 0 133956 2015
12.561647 4.5217886 0 125380 2015
12.449018 4.6151204 0 125883 2018
12.319402  4.744932 0 126996 2013
12.961366  4.927254 0 128339 2018
11.782952 3.6888795 0 122241 2011
 11.95118  4.304065 0 133781 2009
12.506177  4.828314 0 129475 2016
 12.63623 4.7004805 0 121168 2010
12.269048 4.2341065 0 117600 2015
  13.5049  5.135798 0 137288 2018
11.652687 4.3820267 0 124573 2015
 12.15478  4.158883 0 121395 2015
 12.12811  4.477337 1 136561 2014
12.043553  4.317488 0 130981 2013
 12.05525 4.0073333 0 125341 2010
12.303653  4.248495 0 126274 2015
12.487485  4.624973 0 122260 2013
 12.75852  4.867535 0 135616 2010
12.220962 4.4426513 0 130362 2009
  11.9544 4.0943446 0 120110 2009
 12.94801  4.941642 0 136616 2010
  12.6082 4.4998097 0 113603 2013
 11.73607  4.543295 0 127401 2015
12.936034  4.744932 0 119048 2011
12.841326  5.105946 0 129439 2016
 11.73607  3.912023 0 135457 2012
 12.75708  4.787492 0 127093 2009
  11.8706  4.543295 0 134478 2016
 12.26198   4.65396 0 124290 2017
11.901584  4.248495 0 131795 2014
12.821259  5.075174 0 121787 2011
12.706848  4.787492 0 137248 2010
 12.89922 4.5325994 0 117529 2016
 12.84819  4.744932 0 122322 2015
12.356646 4.4308167 0 122305 2009
12.524527  4.787492 0 129779 2014
12.542545 4.4998097 0 118949 2012
 12.07254  4.276666 0 121105 2017
13.226724  4.983607 0 137661 2015
12.668233  4.787492 0 130543 2010
12.542545  4.787492 0 117130 2010
12.141534 4.4426513 0 127707 2009
 12.05525 4.1743875 0 127198 2013
12.254863 4.6051702 0 131829 2016
 12.25009  4.189655 0 139126 2011
12.058152 4.1743875 0 125682 2015
 11.77529  3.496508 0 120487 2016
12.798018 4.6051702 0 117419 2017
 11.97666   4.26268 0 124839 2016
12.717698 4.4886365 0 120927 2016
12.686954  5.075174 0 128242 2010
11.849398 4.4188404 0 115967 2014
12.149503   4.26268 0 131550 2015
11.759786 4.0943446 0 125666 2015
11.652687 3.3322046 0 136795 2015
 12.59473  4.976734 0 114347 2015
 12.19096 4.6051702 0 131337 2016
12.409014 4.7004805 0 130810 2017
 12.89922  5.398163 1 119644 2014
 12.06681  4.317488 0 132404 2015
11.867097 3.8066626 0 125846 2013
 11.91839  4.624973 0 132963 2012
11.759786 4.3820267 0 123839 2012
 11.95118  4.189655 0 122118 2014
12.317166 4.6051702 0 132386 2010
12.043553 4.4998097 0 122743 2009
 12.36734  4.634729 0 132915 2014
12.502466 4.5217886 0 132502 2018
12.266697  4.248495 0 117590 2010
12.827992  4.744932 0 127290 2014
12.259613    3.7612 0 120519 2018
12.623137  4.691348 0 127932 2018
12.016726 4.6051702 0 134348 2010
 11.95118  3.465736 0 132030 2012
 11.95761 4.5217886 0 134537 2011
13.442997  5.225747 0 127838 2018
13.190022  4.890349 0 122341 2018
12.137794 4.2904596 0 124469 2017
12.193494 4.6634393 0 133067 2014
12.058152 4.0073333 0 131780 2016
11.674193  4.317488 0 123975 2014
11.695247 4.4998097 0 134776 2014
12.487485 4.6728287 0 140081 2017
 12.64594   4.94876 0 129299 2015
12.468437  4.919981 0 129213 2014
12.577636  4.804021 0 126778 2014
12.206073  4.624973 0 123956 2017
12.058152  4.204693 0 124524 2017
12.985686  4.770685 0 120176 2017
 12.76581  4.770685 0 124363 2017
12.036174 4.1108737 0 139956 2014
12.452932 4.6728287 0 121736 2017
 12.05525  4.248495 0 126871 2015
end
Array