Hi,

I want to randomly select 40 observations from a dataset for INSURED (1=yes, 0=no) by each CITY. I also want to keep all the data set and only create a dummy variable (SELECT) that marks if those observations were randomly selected or no.

I figure that the code for the random selection is:

sample 40 count, by(INSURED, CITY)

But I am having troubles to keep my complete dataset and only create the SELECT variable.

(Simple sample of my dataset)
Unique_ID CITY INSURED SELECT
34 TOR 1 1
35 BOS 0 0
36 BOS 0 0
37 BOS 1 0
38 BOS 1 0
39 LAX 1 0
40 LAX 1 0
41 LAX 0 0
42 LAX 0 0
43 LAX 1 0
44 TOR 0 1
45 TOR 0 0
46 TOR 1 0
47 TOR 1 1