I am relatively new to Stata and am still learns how to use macros and loops which I suspect may be a part of the required solution.
I need to transform a string variable in a large dataset (~40,000 records) into a form that I can analyse. The string variable (e.g. procedure) is currently a composite string variable where each response is separated by a line return [(char(10)]. Each response is a phrase (e.g. cholecystectomy). For each patient record the variable may be missing, have one response (cholecystectomy) or multiple responses (cholecystectomy (char(10)) appendectomy etc.). I would like to create new binary variables for each of the procedures present. There are approximately 100 different procedure values. I don't have access to a reference list of all possible values this variable can take. Rather I would like the new variable names to be generated from the values already within this variable.
This dataset is coded in this way for about 20 other string variables so the method I use naturally needs to be easily reproducible.
I can use the split command to parse by char(10) but this creates many new variables (e.g. procedure1, procedure2 etc...) but these are not binary (e.g. cholecystectomy, appendectomy).
Code:
split procedure, parse(`=char(10)')
Here's another approach
Code:
clear input byte id str17 phrase 1 "new video post" 2 "newer tweet" 3 "add 12 new photos" 4 "removed 1 photo" end format %-17s phrase split phrase local nwords = r(nvars) forvalues i = 1/`nwords' { levelsof phrase`i', clean foreach word in `r(levels)' { // in case a word is not a valid Stata name local vname = strtoname("`word'") // ignore error if var already exists cap gen byte `vname' = 0 // look for space delimited words qui replace `vname' = 1 if strpos(" " + phrase`i' + " "," `word' ") } }
Thank you for sharing your expertise.
Shamil
0 Response to How to create binary variables from a composite string variable
Post a Comment