Hi,

I have a tricky problem where I have a quarterly panel of observations for parent entities that have been merged with subsidiary entities so that for each observation of parent_ID and date there are potentially multiple matched subs_IDs. I now want to find out whether in a given quarter a new subsidiary was added to the parent entity.

The dataset looks like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(parent_ID date subs_ID new_subs)
1 200 11 0
1 200 12 0
1 200 13 0
1 200 14 0
1 201 11 0
1 201 12 0
1 201 13 0
1 201 14 0
2 200 21 0
2 200 22 0
2 200 23 0
2 201 21 0
2 201 22 0
2 201 23 0
2 201 24 1
end
format %tq date
What I need is code that would generate the variable new_subs, which I have manually created in the example. For parent_ID = 1, new_subs is always zero, because all the subsidiaries present in the second quarter (11, 12, 13, 14) have already been present in the first quarter. However, new_subs switches to 1 for one of the subs_ID for parent_ID = 2 in the second quarter: subs_ID 24 wasn't present in the first quarter.

Any idea how to start? Conceptually it would be something like generating a vector of subs_IDs by parent_ID and date, and then check whether a given subs_ID was an element of the lagged vector, but I am a little lost how to implement this here.

Thanks a lot in advance!