Dear all,

I would like to apply a weighted least squares model but I do not know which weight to use for my problem. I am following the methodology of a published paper. The authors state: "We use weighted least squares, weighting each firm by the inverse of the number of firms in that country-quarter. This is to avoid giving excessive weight to countries in our sample that have a greater fraction of the number of firms such as the U.S.".


Now, I use the following code to get the number of distinct firms and after that to do my regression analysis:

egen tag = tag(firmid domicile_id qdate)
egen distinct = total(tag), by(domicile_id qdate)
gen weight = 1/distinct


reghdfe y x [pweight=weight], vce(cluster firmid)

Is it pweight, aweight or fweight to use in this context? Is it correct to use my variable "weight", or do I have to use the variable "distinct" in the square brackets?


Thank you very much in advance and best wishes,
Simon