After much blood sweat and tears I managed to successfully find a working solution to my problem, however the speed at which this code executes is horrible and I think my solution can be improved upon but I am unable to find how to do so myself.
A quick description of my goal:
I have sorted a bunch of data by 'modelname' and 'nvals'. I have a total of 70k observations but the main structure of the data is as follows:
modelname | nvals | modelrank (want) |
beetle | 1 | 1 |
beetle | 0 | 1 |
beetle | 0 | 1 |
beetle | 0 | 1 |
megane | 1 | 2 |
megane | 0 | 2 |
megane | 0 | 2 |
Z4 | 1 | 3 |
Z4 | 0 | 3 |
This is the case for all 70k observations and 154 different values of 'modelname'
Now I wanted to create a new variable called 'modelrank' that is 1 for every instance of 'beetle', 2 for every instance of 'megane', 3 for every instance of 'Z4', etc etc.
It is important that this rank is created in the sequence that the data is currently in as the data has also been sorted by the number of times each unique 'modelname' occurs in the dataset.
The solution I have found is as follows, but it takes about 4mins to completely classify the dataset.
gen modelrank = 5000This code starts the 'modelcounter' at zero and increments by 1 every time it encounters a 1 in 'nvals', which coincides with the first instance of a new modelname.
gen modelcounter = 0
forvalues i = 1/70807 {
replace modelcounter = modelcounter + 1 if nvals[`i'] == 1
replace modelrank = modelcounter[`i'] if _n == `i'
}
At the end of the for-loop I replace 'modelrank' with whatever modelcounter is currently set to, but the notation is a workaround the fact that I cant use square brackets on the LHS of the equal sign (first I had this line set to "replace modelrank[`i'] = modelcounter[`i']" but this gives me the error 'weights not allowed')
Could someone point me to a better solution?
0 Response to Improving Code Speed
Post a Comment