I have a dataset of worker employment spells, and I would like a general way to identify an individual's co-workers when the individual starts a job. Consider the following example dataset:
Code:
input worker_id firm_id start_date end_date
1 1 1 4
1 2 5 10
2 1 2 8
2 2 9 10
3 1 6 7
4 3 2 7
end
When worker 1 starts work in firm 1, she has no co-workers, same when she starts in firm 2. When worker 2 starts in firm 1, worker 1 is her co-worker until time period 4, and worker 3 is her co-worker from time period 6 on. When worker 3 starts, worker 2 is her co-worker for the time worker 3 is in . When worker 4 starts, she has no co-workers.

The following code illustrates something like what I'm after
Code:
levelsof firm_id, local(firms)

tempfile worker_ds
save "`worker_ds'"
clear
gen worker_id = .
tempfile coworker_ds
save "`coworker_ds'"

foreach j of local firms {

    di ""
    di "results for firm `j'"
    di ""
    
    use "`worker_ds'", clear
    keep if firm_id == `j'
    count
    local obs = r(N)
    total_coworkers = .
    
    * identify set of coworkers for each worker in firm j
    forvalues i = 1/`obs' {
        local coworkers`i' ""
        local n_coworkers = 0
        forvalues ii = 1/`obs' {
            if start_date[`ii'] < end_date[`i'] & end_date[`ii'] > start_date[`i'] & `i' != `ii' {
                local nextworker = worker_id[`ii']
                local coworkers`i' `coworkers`i'' `nextworker'
                local n_coworkers = `n_coworkers' + 1
            }
        }
        replace total_coworkers = `n_coworkers' if _n == `i'
    }
    
    * create a worker-coworker level dataset
    expand total_coworkers_all, generate(expand_obs)
    gen coworker_id = .
    sort worker_id
    local n = 1
    
    * fill in values for coworker_id variable
    forvalues i = 1/`obs' {
        di "`i''s coworkers are `coworkers`i''"
        foreach id of local coworkers`i' {
            replace coworker_id = `id' if _n == `n'
            local n = `n' + 1
        }
    }
    
    append using "`coworker_ds'"
    save "`coworker_ds'", replace

}
I say "general way" because while this code would do what I want on test data, it would not in general work. I have 40 million spells, so looping, replacing, or sorting repeatedly will take far too long. The code could also fail in various ways: (i) I could easily hit the size limit for the macro -coworkers`i'-; (ii) the code assumes that each worker works a single spell in a firm; (iii) I'm sure people can come up with other reasonble cases in which this code would fail.

I realise this a broad, somewhat vague question, but it would already be great if anyone has suggestions for commands or features of stata's syntax I could exploit to radically speed this up and make is more robust, or could point me towards references that might have suggestions for how to go about solving this. Even improving individuals steps would be helpful.

There are many things I would like to subsequently do with these data. Two important ones are calculate the number of coworkers with certain charactersitics a worker has during a given spell, and see whether a worker has former coworkers in a firm in a subsequent spell with a different firm. I am using Stata 14.1