Generating data to balance a panel dataset

Hi,

I have a panel from Compustat like the below in Table 1 (table 1 is a subset of the data to show 3 different example issues), where gvkey is the firm-specific identifier, fyear is the reporting year, emp is employment, and dlrsn is the reason the firm dropped out of the dataset.

Table 1

gvkey	fyear	emp	dlrsn
001	1996	2	02
001	1997	3	02
001	1998	2	02
001	1999	1	02
002	1996	4	06
002	1997	5	06
002	1998	4	06
002	1999	3	06
002	2000	3	06
002	2001	3	06
002	2002	3	06
002	2003	3	06
003	1996	7	.
003	1997	8	.
.	.	.	.
.	.	.	.
.	.	.	.
003	2016	14	.

I need employment data for each firm all the way up to and including 2016 (as shown in Table 2). However, many firms drop out of the dataset (e.g., because of bankruptcy). For such firms, I want to generate employment numbers for all years from the last date they reported, going up to and including 2016 using the following methodoloy:

If dlrsn is 02 or 03, set employment number to zero from the first year after the last reporting year going up to and including 2016. For example, in Table 2, firm 001 reports up to 1999, I would like to generate data that has fyears 2000-2016 and employment set at 0 because dlrsn is 02.
If dlrsn is 01,04,05,06,07,09,10,20, use the last reported employment number for all years after the last reported year. For example, in Table 2, firm 002 reports up 2003, I would like to generate data that has fyear 2004-2016 and employment is set equal to the last available employment number (i.e.,4) because dlrsn is 06
If the firm does not drop out of the dataset, nothing should change.

Table 2

gvkey	fyear	emp	dlrsn
001	1996	2	02
001	1997	3	02
001	1998	2	02
001	1999	1	02
001*	2000*	0*	02*
001*	2001*	0*	02*
001*	2002*	0*	02*
.	.	.	.
.	.	.	.
.	.	.	.
001	2016	0*	02
002	1996	4	06
002	1997	5	06
002	1998	4	06
002	1999	3	06
002	2000	3	06
002	2001	2	06
002	2002	3	06
002	2003	4	06
002*	2004*	4*	06*
002*	2005*	4*	06*
.	.	.	.
.	.	.	.
.	.	.	.
002*	2016*	4*	06*
003	1996	7	.
003	1997	8	.
.	.	.	.
.	.	.	.
.	.	.	.
003	2016	14	.

Essentially, I want to get from Table 1 to Table 2 and would very much appreciate any advice (note there are thousands of firms where different dates and the above is just a example). I tried the following code but got a bit stuck:

gen dldteyear=year(dldte)
bysort gvkey: egen lastdate=max(fyear)
expand yeardiff if fyear==lastdate

The code was able to duplicate the last reported date in Table 1 the correct number of times but then I was a bit stuck with how to do the next step, I was thinking replace years because they would have to be made consecutive and then replace employment but this started to get a bit messy. I am sure there is probably a better approach then the one I am taking which seems rather mechanical.

Thanks in advance for all support.

Best,
Ali

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Generating data to balance a panel dataset
Generating data to balance a panel dataset

0 Response to Generating data to balance a panel dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating data to balance a panel dataset Generating data to balance a panel dataset

0 Response to Generating data to balance a panel dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating data to balance a panel dataset
Generating data to balance a panel dataset