Hello statalisters,

My case is that of an unbalanced panel data composed by 25,714 companies ("cusip_id") and 528 months of data for each. I want to run the Fama and MacBeth procedure using the function xtfmb in which 528 cross-sectional regressions are performed and the coefficients then averaged across the time series. My command looks as follows:

Code:
xtfmb e_ret b s h r c w
Where "e_ ret" are excess returns and "b, s, h, r, c, w" are factor loadings that I want to regress the former upon.

Firstly, I suspect for the characteristics of the dataset that I should be correcting for fixed time effects as there are events in particular years that affect all companies across the board. The puzzle is that the function "xtfmb" does not allow for this and so I need to find a way around. Does anything come to mind?


Secondly, I also want to correct my SE for autocorrelation and heteroskedasticity using the Newey-West Standard Errors, which the function "xtfmb" allow for by adding lags to the command:

Code:
xtfmb e_ret b s h r c w, lag(#)
However, I do not know exactly what should be the appropiate number of lags. I have experienced that the more lags I include the more significant some coefficient become to the point of becoming highly significant so this is an important determinant for my results. Could you also help me on this?


Thirdly, in using the function "xtfm" I have to declare beforehand that my panel data is such. Usually I would do so with the command "xtset cusip_id date"; however, in the help xtfmb the authors mention, "You must tsset your data before using xtfmb., thus should I be using "tsset cusip_id date" instead? And, that is really matter whether I use one or the other with a panel data?


And lastly, since the panel data is fairly large, this function takes about 10 minutes to yield results. Apart from improving my computing power and dropping those variables that are not essential, do you have any other advice to speed up the process?



To provide you with some backdrop here is a sample of my pane data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double date long cusip_id float(e_ret b s h r c w)
413 1 -6.432449 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
414 1      -.28 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
415 1  30.06478 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
416 1 1.2966667 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
417 1 1.2593442 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
418 1  6.081613 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
419 1 22.287273 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
420 1 4.5182714 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
421 1 1.9529412 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
422 1 13.333103 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
423 1  1.580202 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
424 1 4.4104953 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
425 1 10.850755 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
426 1  6.329661 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
427 1  7.863334 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
428 1 10.559011 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
429 1 16.361683 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
430 1 14.834237 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
431 1 28.921766 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
432 1   8.66091 1.5933827 4.1983247 -4.4861035 5.903217 .9038961 .9595598
end
format %tm date
label values cusip_id cusip_id
label def cusip_id 1 "00077R10", modify

Fama and MacBeth procedure in Stata


Thank you for your attention.