Identify errors in time-invariant variable in panel data

Hi Statalist

I have an unbalanced panel data of which the following is a representative sample:

ID	Year	Gender
1	2007	M
1	2008	M
1	2009	M
2	2007	F
2	2008	F
2	2009	F
2	2010	M
2	2011	M
3	2007	F
4	2007	F
4	2008	F
4	2009	F
5	2007	M
5	2008	F

For the sake of this question, let’s suppose one’s gender does NOT change over time. Therefore, instances like ID 2 and ID 5 are likely due to data entry errors.

I would like to flag such instances in the following way:

ID	Year	Gender	tag
1	2007	M	0
1	2008	M	0
1	2009	M	0
2	2007	F	1
2	2008	F	1
2	2009	F	1
2	2010	M	1
2	2011	M	1
3	2007	F	0
4	2007	F	0
4	2008	F	0
4	2009	F	0
5	2007	M	1
5	2008	F	1

The code I tried so far is:

Code:

. egen gender_tag = tag(ID gender)
. egen gender_ntags = total(gender_tag), by(ID)
. browse ID Year Gender if gender_ntags != 1

This gives me the list of the problematic IDs but it is still a very manual process to then individually examine each ID (especially as I have a very large dataset).

Could someone please suggest a solution that would give me the results in the 2nd table above?

Thanks.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Identify errors in time-invariant variable in panel data
Identify errors in time-invariant variable in panel data

0 Response to Identify errors in time-invariant variable in panel data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Identify errors in time-invariant variable in panel data Identify errors in time-invariant variable in panel data

Related Posts with Identify errors in time-invariant variable in panel data

0 Response to Identify errors in time-invariant variable in panel data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Identify errors in time-invariant variable in panel data
Identify errors in time-invariant variable in panel data