Dear statalist member,

I am trying to conduct a selection bias test for my analysis, but can't seem to come up with an algorithm how to implement it in stata.

Here is a subset of the dataset

Code:
input byte project_type long start_date double project_id int manager_id
1 17599   52000471 194
1 17478   83000442 206
1 16869   62000028 214
1 16917   62000054 214
1 16974   45006794 216
2 17021   45007248 216
2 17275   45009016 216
2 17329   45009408 216
2 17333   45009422 216
3 17360   79000073 216
3 17373   45009664 216
3 17436   45009892 216
3 17457   45010174 216
3 17480   45010360 216
3 17508   45010381 216
3 17541   45010657 216
3 17553   45010451 216
4 17574   45010819 216
4 17584   45010902 216
4 17597   45010951 216
4 17603   45011012 216
4 17668   45011378 216
4 17728   45011644 216
4 17967   45012631 216
4 17858   48004687 237
4 17282   67000265 286
4 17968   80000702 291

and here is what I am trying to do


Step 1. calculate the number of managers who started (start_date) the same project_type, as the current observation is during +/- 1 month period. Alternatively, it might be easier to calculate previous/next 10 times?
Step 2. store the result of the step 1 as variable n_count
Step 3. using expand n_count, create new dataset
Step 4. replace (except in the original observation) project_id values of the new observations in the expanded dataset with the values of project_id variable of the managers who started this type of project during +/- 1 month period (same managers I am counting in Step1)
Step 5. the new dataset will be used to calculate the "selection bias correction" value to include in the regression on the original dataset

Can't figure out how to complete step 1 and step 4.... other than manually, which will take forever, given that my original dataset has 20,000 observations....


Hope someone can help with it or suggest alternative approach...