Hello everyone!

I am new in this forum and looking forward to the discussions on Stata! I am currently looking for a solution for the following problem and maybe someone knows exactly how to do it or can give me some hints

I have a variable X( taking the value of 0,1,2,3,4 or 5) with aprox. 42,000 observations and, conditioned on another variable Y (which takes the values 1, 2 or 3), contains 5395 "0"-values for Y = 1, 1468 for Y = 2 and 53 for Y = 3 . Now, I want to replace these zero values with 1,2,3,4 or 5 for each Y in such a manner that the distributional pattern of these 5 numbers is maintained for each Y (each Y has its own unique set of 0s, 1s, 2s, 3s, 4s and 5s within X). The following example probably helps illustrating my task:

for Y = 1 ...
X = 1 occurs with a value (percentage in decimal) of 0.0731451, X = 2 with a value of 0.6319624, X = 3 with a value of 0.252785, X = 4 with 0.0271141 and X =5 with 0.0149933

These decimal percentages are saved in a variable Z`i' with i = 1,..,5

thus...

since for Y = 1 "0" occurs 5395 times, 395 (5395/0.0731451; rounding needs to be done) "0"-values need to be replaced with 1, 3409 "0"-values need to be replaced with 2 and so on.


My current idea for the solution was the following:

gen X_Copy = X

forval d = 1/3 {
forval i = 1/5 {
replace X_Copy = round(cond(runiform() < (1 - Z`i'), 0, `i'), 0.01) if Y == `d' & X == 0 & X_Copy == 0)
}
}

The idea was that X_Copy is used for the replacement of the "0"-values while X is used to uphold the before existing amount of zeros so that the correct number of zeros are replaced throughout. The problem here is that the total numbers of replaced values in the end does not add up to 5395. So, I am currently at a loss on how to do it in a way that is at least somewhat elegant. Maybe there is a completely different approach which I do not think of right now which would allow me to specifically determine the number of "0"-values (i.e. for Y = 1 and X = 1 395) that ar to be replaced. Alternative ideas are welcome!

I am looking forward to your answers and I hope you can help me with this!

All the best
Sebastian