Dear all,
I turn to you for a second request for help linked to the question at the link: https://www.statalist.org/forums/for...haping-to-long.
I would need to reshape a Best to Worst Scaling (BTWS) dataset from Sawtooth Software.
For those unfamiliar with it, the BTWS method is based on surveys where the respondent is asked to indicate between a set (CHOICE SET) of 4 alternatives (ITEMS-A) the best and the worst in his opinion. This experiment is repeated several times. In our case the choice sets are 8. In addition to the BTWS exercise, in the questionnaire we collected socio-demographic information such as income, age, sex, and so on. The Sawtooth Software allows to obtain the ranking of the items by making the difference between best and worst, but we want to develop logit analysis so that we can also introduce other variables in the predictive model.
The software provides as a result a dataset composed as follows:
ID | B1 | W1 | B2 | W2 | B3 | W3 | B4 | W4 | B5 | W5 | B6 | W6 | B7 | W7 | B8 | W8 | AGE | INCOME
where ID is the indicator of the interviewee, B1 is the "best" option chosen by the interview in the first CHOICE SET and W1 is the "worst" option chosen by the interview in the first CHOICE SET, B2 and W2 in the second CHOISE set, and so via up to B8 and B8. AGE and INCOME are the socio-economic variables for each interviewee.
In order to develop a Logit model, the dataset must have a different format.
Starting from the vector built for each interviewee, I should obtain the dataset in this way:
ID; Interviewee_ID; Choice Set; A1; A2; A3; A4; A5; ...; A24; CHOICE; AGE, INCOME
1; 1; 1; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
2; 1; 1; 1; 0; 0; 0; 0; ...; 0; 1; 64; 10000
3; 1; 1; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
4; 1; 1; 0; 0; 0; 0; 0; ...; 1; 0; 64; 10000
5; 1; 1; 0; -1; 0; 0; 0; ...; 0; 0; 64; 10000
6; 1; 1; -1; 0; 0; 0; 0; ...; 0; 0; 64; 10000
7; 1; 1; 0; 0; 0; 0; -1; ...; 0; 1; 64; 10000
8; 1; 1; 0; 0; 0; 0; 0; ...; -1; 0; 64; 10000
9; 1; 2; 0; 0; 0; 1; 0; ...; 0; 1; 64; 10000
10; 1; 2; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
11; 1; 2; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
12; 1; 2; 0; 0; 0; 1; 0; ...; 0; 0; 64; 10000
13; 1; 2; 0; 0; 0; -1; 0; ...; 0; 0; 64; 10000
14; 1; 2; 0; -1; 0; 0; 0; ...; 0; 1; 64; 10000
15; 1; 2; 0; 0; 0; 0; -1; ...; 0; 0; 64; 10000
16; 1; 2; 0; 0; 0; -1; 0; ...; 1; 0; 64; 10000
...
Where the first variable indicates the ID of the response in the entire dataset, Interviewee_ID indicates the ID of the respondent, Choice set indicates the group of alternatives proposed for each choice set (1-8) to be selected as best and worst, variables A1 to A24 indicate the different items, where it takes value 1 if the alternative is in the group of possible BEST solutions, -1 if in the WORST (for each set there are 4 BEST and 4 WORST alternatives, and in the dataset the BEST and then the WORST must appear first), CHOICE indicates the choice made by the interviewee (where 1 indicates the line where the interviewee's choice is described), then AGE and INCOME. In the example, only 2 choice sets were shown for the first interviewee. For each respondent we should have (4 + 4) * 8 = 64 lines.
In the example, in CHOICE SET 1, respondent 1 chose A1 as Best solution, and A5 as Worst. While in CHOICE SET 2, interviewee 1 chose A4 as Best solution, and A2 as Worst.
Is it possible to carry out this transformation of the dataset into STATA?
Many thanks in advance
Federico
0 Response to Reshape Best to Worst dataset
Post a Comment