I am working with string data and I would like to retrieve the most important word from a string variable that reflects a text-input field. I have already split the original variable into five different variables.
The dataset is like
obs | word1 | word2 | word3 | word4 | word5 |
1 | 2.45X90 | VASSOURA | |||
2 | N9020X | PIVO | SUPERIOR | ||
3 | (S | 1063T) | LANTERNA | ||
4 | 15W4020 | L | OLEO | ||
5 | E V A | GLITER |
In particular, I would like the dataset to be as the following
obs | word1 | word2 | word3 | word4 | word5 | final_word |
1 | 2.45X90 | VASSOURA | VASSOURA | |||
2 | N9020X | PIVO | SUPERIOR | PIVO | ||
3 | (S | 1063T) | LANTERNA | LANTERNA | ||
4 | 15W4020 | L | OLEO | OLEO | ||
5 | E V A | GLITER | GLITER |
The criteria is the following: if 'word?' has
(i) no special characters ("." "(" "*" and others)
(ii) no numbers
(iii) no whitespace among letters (see obs == 6 for a case of whitespace among letters)
(iv) length > 1 (considering the length of 'word?')
then 'final_word' == 'word?'.
I would like to first check word1, then check word2, after that check word3 and so forth.
Could you help me to find a solution for that?
Thank you very much!
Below I provide the code for importing the example dataset into Stata :
clear
input byte obs str20 word1 str20 word2 str20 word3 str20 word4 str20 word5
1 "2.45X90" "" "" "VASSOURA" ""
2 "N9020X" "PIVO" "SUPERIOR" "" ""
3 "(S" "1063T)" "LANTERNA" "" ""
4 "15W4020" "L" "" "OLEO" ""
5 "E V A" "GLITER" "" "" ""
end
Obs: I tried to use 'dataex' but I found it easier, in this case, to provide the 'importing code'.
0 Response to How to create an iterative procedure to check variable by variable whether a given observation satisfies some criteria - string variables
Post a Comment