Merging many string variables into one

Hi,
I'm using Stata for my short paper.
I'm working on a merged long dataset with 656 observations.
I have 12 string variables called "microvoce1" "microvoce2" "..." .... "microvoce12" .
Each variable reports the needs expressed by the interviewees (656). Specifically :

- microvoce1 reports the first need expressed by a person (tot. 656 people reported a first need)

-microvoce2 reports the second need expressed by a person (tot. 295 people reported a second need - it means that the microvoce2 variables has 361 missing.

-microvoce3 reports the third need and so on. (tot. 162 people reported a third need - microvoce3 has 494 missing)

Each need has a specific labelled code ("POV", "POV01", "POV02"..). The codes are the same for all 12 variables, but not all variables contain all codes.

For example, the first variable (microvoce1) contains "POV" "POV01" "POV02" with associated frequencies;
the second variable (microvoce2) contains "POV" "POV01" "POV99";
the third variable (microvoce3) contains "POV" "POV99" "CAS".

I want to obtain a new variable (called microvoceTOTAL) that contains all the labelled codes contained in the individual 12 variables and the corresponding frequency sums.

If microvoce1 contains
"POV" = 2
"POV01" = 1
"POV02" = 3

and microvoce2 contains
"POV" = 4
"POV01"= 2
"POV99" = 5

and microvoce3 contains
"POV" = 1
"POV99" = 12
"CAS" = 8

the new variable (microvoceTOTAL) should contain
"POOR" = 7 (2+4+1)
"POOR01" = 4 (2+2)
"POV02" = 3
"POV99" = 17
"CAS" = 8

I tried to use "stack" but it is not the solution since I need to keep all the other variables in the dataset, while "stack" make me lose all of them.

I also tried to use
gen microvoceTOTAL = microvoce1+microvoce2 (for example)
but it just give me a concat effect, so the result in microvoceTOTAL will be "POVPOV" = 1 rather than "POV" = 2

Lastly i tried to use
gen microvoceTOTAL=.
replace microvoceTOTAL=1 if microvoce1=="POV" | microvoce2=="POV" | microvoce5=="POV"
replace microvoceTOTAL=2 if microvoce1=="POV01" | microvoce2=="POV01"
and so on for all the codes; but the result is still a variables with 656 observations, while the sum of the frequencies of all the labelled codes should be 1185.

How can I merge these 12 variables into a new one?

I hope I made myself clear. Forgive the lexicon, but I'm new at this.
I remain available for any clarification.

Thanks in advance,
Massimiliano

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Merging many string variables into one
Merging many string variables into one

0 Response to Merging many string variables into one

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Merging many string variables into one Merging many string variables into one

Related Posts with Merging many string variables into one

0 Response to Merging many string variables into one

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Merging many string variables into one
Merging many string variables into one