Hello Statalist
I have a data set with a variable haart1 (combination antiretroviral) as shown below. I will like to get the independent molecules, ordered in a consistent way
For example, the following values should be coded in the same way
Code:
“Lopinavirlamivudinetenofovir disoproxilLopinavirtenofovir disoproxillamivudine”
“tenofovir disoproxilLopinavirlamivudinetenofovir disoproxilLopinavirlamivudine”
Should be “lamivudine lopinavir tenofovir”
I have tried to use strpos to identify the common known combinations manually (but I just cannot accurately generate all possible combinations given I have about 24 names to be combined in triads or quartets)
Code:
replace haart1 = "FTD_TDF_EFV" if strpos(haart1,"tenofovir")& strpos(haart1,"emtricitabine")& strpos(haart1,"efavirenz")>0
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str168 haart1
"Lopinavirlamivudinezidovudine"             
"lamivudinezidovudineLopinavir"             
"Lopinavirlamivudinezidovudine"             
"lamivudineLopinavirzidovudine"             
"lamivudineLopinavirzidovudine"             
"zidovudineLopinavirlamivudine"             
"Lopinavir"                                                                 
"Lopinavir"                                 
"lamivudinetenofovir disoproxilLopinavir"   
"lamivudineLopinavirtenofovir disoproxil"   
"lamivudineLopinavirtenofovir disoproxil"   
"lamivudinetenofovir disoproxilLopinavir"   
"Lopinavirtenofovir disoproxillamivudine"   
"lamivudineLopinavirtenofovir disoproxil"
end
Ps. the order of word is not important. So AABBCC, CCCBBAAA, CCAABBB should all be ABC, (where each letter represents a word in the string).

Thanks in advance

Vitalis