I am trying to conduct a selection bias test for my analysis, but can't seem to come up with an algorithm how to implement it in stata.
Here is a subset of the dataset
Code:
input byte project_type long start_date double project_id int manager_id 1 17599 52000471 194 1 17478 83000442 206 1 16869 62000028 214 1 16917 62000054 214 1 16974 45006794 216 2 17021 45007248 216 2 17275 45009016 216 2 17329 45009408 216 2 17333 45009422 216 3 17360 79000073 216 3 17373 45009664 216 3 17436 45009892 216 3 17457 45010174 216 3 17480 45010360 216 3 17508 45010381 216 3 17541 45010657 216 3 17553 45010451 216 4 17574 45010819 216 4 17584 45010902 216 4 17597 45010951 216 4 17603 45011012 216 4 17668 45011378 216 4 17728 45011644 216 4 17967 45012631 216 4 17858 48004687 237 4 17282 67000265 286 4 17968 80000702 291
and here is what I am trying to do
Step 1. calculate the number of managers who started (start_date) the same project_type, as the current observation is during +/- 1 month period. Alternatively, it might be easier to calculate previous/next 10 times?
Step 2. store the result of the step 1 as variable n_count
Step 3. using expand n_count, create new dataset
Step 4. replace (except in the original observation) project_id values of the new observations in the expanded dataset with the values of project_id variable of the managers who started this type of project during +/- 1 month period (same managers I am counting in Step1)
Step 5. the new dataset will be used to calculate the "selection bias correction" value to include in the regression on the original dataset
Can't figure out how to complete step 1 and step 4.... other than manually, which will take forever, given that my original dataset has 20,000 observations....
Hope someone can help with it or suggest alternative approach...
0 Response to selection test algorithm
Post a Comment