Hi.

I am using large n panel data with four annual waves, from which I want to exclude/drop data. I only want to keep respondents who have a certain combination of values in a certain variable over the four rounds.

The variable is involvement in leisure time organisations, which is dummy coded with yes (1) and no (0).
However, I only want to keep the respondents who:

1. Were none-members in wave 1.
AND
2. Were active members in wave 2, 3 and 4.

In this way, I would get a dataset in which I can analyze effects of being involved in an organisation long term, and were all respondents got involved at roughly the same time.

My problem is that I can't find a way to break out only the respondents who meet the above criteria (value 0 in wave 1, value 1 in wave 2, 3, 4).

Does anyone have an idea on how to do that? It feels simple, but I can't seem to be able to figure it out.

My method so far:

The strategy I have tested is to drop all respondents who do not meet the criteria in each round, and then combine the rounds using 'append'. My idea was to sort the id-variable on frequency of distinct values, so that the id:s occuring four times appear on top of the list and id:s occuring only once appear at the bottom.
In such a way I could manually drop the all id:s that appear three times or less - which would solve my problem. But I dont seem to find such a sorting function. If anyone know how I sort my data in such a way, it would be of much help.

I am however sure that there are other better ways to solve my problem. I am open for all types of solutions.

Thanks!


Just to be clear:

When using 'append', the ID:s/respondents line up like this:

ID | Round
1 | 1
1 | 2
1 | 3
2 | 1
3 | 1
3 | 2
3 | 3
3 | 4
4 | 1
4 | 2
...And so forth