Hi everyone,

I have the following issue. In one folder, I have stored different xlsx files. Each files corresponds to an identifier and it is repeated four times. Below you have a screenshoot, just to give you a snapshot:

Array

As you can see, each identifier is repeated 4 items with label _1 _2 _3 _4.

What I need to do with Stata is to list each unique idenfitier that starts with "GB" letters, then I have to copy and paste these identifiers for another type of work. The critical point is that, in order to be included in my sample, each identifier must have all _1 _2 _3 _4 files. If, for instance, one identifier is missing _2 or _1, it has not to be included in my list.

I'm wondering whether this can be feasible and effeciently done with STATA.
Would you be so kind to help me?

Many thanks in advance