Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

I have some questions regarding an issue I have encountered when trying to change a string variable into a numeric (categorical) one. I am writing my question in a series of commands to show my steps.

My variable is called "confused" - with a 5-point Likert scale response for possible answers (1=Very unlikely, 2=Unlikely, 3=Neutral, 4=Likely, 5=Very likely). By chance, the category "Unlikely" appears not to have been selected by survey respondents for this particular variable, and there were 23 respondents who didn't complete the question.

Here is my variable:
. tab confused, missing

Consider a |
patient who |
you think |
might have |
delirium. How |
likely do the |
words li | Freq. Percent Cum.
--------------+-----------------------------------
| 23 10.13 10.13
Likely | 66 29.07 39.21
Neutral | 2 0.88 40.09
Very likely | 135 59.47 99.56
Very unlikely | 1 0.44 100.00
--------------+-----------------------------------
Total | 227 100.00

For analysis, I want to change the categorical Likert responses into numeric data with labels.

1) First, I tried destring:
. destring confused, replace

The error message was:
confused: contains nonnumeric characters; no replace

2) I wasn't sure why I got this message, so I tried encode:
encode confused, generate(confused2)

. d confused2

storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------------------------------------
confused2 long %13.0g confused2
Consider a patient who you think might have delirium. How likely do the words li

3) This looks good, but I would like to ensure that the correct categories have the correct numeric labels for future analysis. I prefer to have numbers as the primary data source so that if I need to do things quickly with future commands, I can write <if confused ==1> instead of <if confused =="Very unlikely">.

Initially I thought:
. label define likertlbl 1 "Very unlikely" 2 "Unlikely" 3 "Neutral" 4 "Likely" 5 "Very likely"

. label values confused2 likertlbl

. tab confused2, missing

Consider a |
patient who |
you think |
might have |
delirium. How |
likely do the |
words li | Freq. Percent Cum.
--------------+-----------------------------------
Very unlikely | 66 29.07 29.07
Unlikely | 2 0.88 29.96
Neutral | 135 59.47 89.43
Likely | 1 0.44 89.87
. | 23 10.13 100.00
--------------+-----------------------------------
Total | 227 100.00

But the output "confused2" here doesn't match the original "confused" categories - the response "likely" should have had n=66 from the original data, but here it has been made to match the "very unlikely" response.

I think this is because the responses were not in hierarchical order in the "confused" variable as they appear alphabetical in the list. Therefore when I used the <label define> command, it assumed that everything was already in hierarchical order in "confused2" when it assigned labels.

FIRST QUESTION: Is my interpretation of this mismatch correct?

4) So, to try again, I performed the following commands

. generate confused3 = 1 if confused =="Very unlikely"
(226 missing values generated)

. replace confused3 = 2 if confused =="Unlikely"
(0 real changes made)

. replace confused3 = 3 if confused =="Neutral"
(2 real changes made)

. replace confused3 = 4 if confused =="Likely"
(66 real changes made)

. replace confused3 = 5 if confused =="Very likely"
(135 real changes made)

. tab confused3, missing

confused3 | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 0.44 0.44
3 | 2 0.88 1.32
4 | 66 29.07 30.40
5 | 135 59.47 89.87
. | 23 10.13 100.00
------------+-----------------------------------
Total | 227 100.00

Now I have the correct "match" - I know there were 66 responses that selected "likely" where 4=likely.
I can now attach my labels like this:

. label values confused3 likertlbl

. tab confused3, missing

confused3 | Freq. Percent Cum.
--------------+-----------------------------------
Very unlikely | 1 0.44 0.44
Neutral | 2 0.88 1.32
Likely | 66 29.07 30.40
Very likely | 135 59.47 89.87
. | 23 10.13 100.00
--------------+-----------------------------------
Total | 227 100.00

So in the end, I have achieved what I wanted. However this was very labor intensive!

SECOND QUESTION: Is there an easier way to achieve what I wanted than going through the code in section 4?

Previously I have made the changes for the numeric categories in Excel, so that when I import into Stata, they are already numeric. And then I simply assign the variable labels. I was hoping to do everything in Stata at once so that there was a clear log file for all commands with data cleaning and manipulation.

Apologies if these are simple questions and I have made it a long-winded scenario!

Any assistance appreciated.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values
Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

0 Response to Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

Related Posts with Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

0 Response to Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values
Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values