Apologies in advance for a somewhat convoluted title for the post. I have already solved the problem and you can find my solution below. But I am looking for an efficient solution if there is any.
Problem: So I am supposed to have a unique Company ID - State pair for each year in multi-year data. Both company Id and year are already numeric. Now the problem is that for some years company id has missing state info, and I want to replace missing with state info from any other year.
My solution:
Code:
egen firm = group(company_id)
su firm, meanonly
forvalues var = 1/`r(max)' {
qui summ state if firm == `var'
qui replace state = `r(max)' if firm == `var'
}
Essentially I am replacing max of state for all the values of state for every instance of the firm in a loop. This worked. But this is not an efficient solution for around 10 million observations. So any efficient solution is welcome. Maybe I am missing something that stata already has.
0 Response to Replacing missing values within a variable with true value based on an ID - Seeking efficient solution.
Post a Comment