Hello Statalist,

1. Problem description

I have a data set that is produced by a data base in the format as in the table below, although this example is simplified. The data data set is outputted in a dataex script below the table. The data set is a result of merging two data sets, data1 and data2. In the example below the variable names show which variable comes from which data set. The two data sets are merged on country.

After I have this data I want to multiply the value in data2_varK with the value of the variable listed in data1_varY. So for country Aland I want to multiply data1_varY with data2_varK (22 * .25), but for Eland I wan to multiply data1_varZ with data2_varK (5 * .5). While most countries will only have the one value I want to use in data1_varX, data1_varY and data1_varZ, there are cases, like Cland in the example, that has multiple values.

We want to do this multiple times and sometimes it is not multiplication. It could also be generate a dummy if value in data1_varX, data1_varY or data1_varZ is higher than a cut-off value that also coming from data2.

country data1_varX data1_varY data1_varZ data2_varJ data2_varK
Aland 22 data1_varY .25
Bland 34 data1_varY .75
Cland 15 42 data1_varX .6
Dland 24 data1_varX .85
Eland 34 data1_varZ .75
Fland 5 data1_varZ .5

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 country byte(data1_varX data1_varY data1_varZ) str10 data2_varJ double(data2_varK result)
"Aland"  . 22  . "data1_varY" .25  5.5
"Bland"  . 34  . "data1_varY" .75 25.5
"Cland" 15 42  . "data1_varX"  .6    9
"Dland" 24  .  . "data1_varX" .85 20.4
"Eland"  .  . 34 "data1_varZ" .75 25.5
"Fland"  .  .  5 "data1_varZ"  .5  2.5
end
2. What we've tried so far

Test 2a;

Code:
gen result2a =  data2_varK * `=data2_varJ[_n]'
We have tried something like this, but `=data2_varJ[_n]' takes the value of data2_varJ from the first row for all rows. So for all rows the code evaluates to this since data2_varJ in the first row is "data1_varY".

Code:
gen result2a =  data2_varK * data1_varY
Test 2b;

Code:
bys data2_varJ : gen result2b =  data2_varK * `=data2_varJ[_n]'
This has the same result as test 2a. `=data2_varJ[_n]' is evaluated into data1_varY for all rows, like this:

Code:
bys data2_varJ : gen result2b =  data2_varK * data1_varY
Test 2c;

Code:
bys data2_varJ : gen result2c =  data2_varK * `=data2_varJ[1]'
This gave the same result as in 2b.

3. What works but we are concerned will be too slow

Code:
        
gen results3a = .
forvalue obs = 1/`=_N' {
    replace results3a =  data2_varK * `=data2_varJ[`obs']' if _n == `obs'
}
We are so far only piloting this, so this might not be too slow, but we are concerned that it will be. We also would love to be able to solve this on one line as then we can plug it in to the data base so it happens on the fly. Nevertheless, it is good to have this loop-option as an option of last resort in case someone else have a better suggestion.

Thanks,
Kristoffer