Wednesday, December 2, 2020

Creating new observations by group

Dear Statalist members,

I am working with data about nutritional intake where participants describe what they eat during two consecutive days. Each food is entered as a new observation in the database and is categorised by a food group and subgroup, which appear as the variables "group" and "subgroup1". There are 18 groups, coded from 1 to 18. A quantity ("cons_qty") is also associated with every food consumed.

I successfully used fillin as suggested by Clyde Schechter to create new observations for groups when these observations did not exist in the database (see:https://www.statalist.org/forums/for...-in-a-database). I would have another question related to this.

Each group is further divided into subgroups. The different groups don't have the same number of subgroups, for example group 4 has 5 subgroups whereas group 7 has 8 subgroups. I want to create new observations for subgroups that were not consumed, in the same idea that for the groups.

Here is an example of the dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id byte(int_num group subgroup1) float cons_qty
1 21  1 .         0
1 21  2 0       261
1 21  2 1        32
1 21  2 2  132.1875
1 21  2 2      76.8
1 21  2 2       111
1 21  2 3      40.5
1 21  3 .         0
1 21  4 1       207
1 21  4 1       138
1 21  5 1    64.375
1 21  5 1    64.375
1 21  5 5      21.8
1 21  5 5       125
1 21  6 .         0
1 21  7 .         0
1 21  8 .         0
1 21  9 .         0
1 21 10 1      25.2
1 21 11 2        10
1 21 12 2        10
1 21 13 3     187.5
1 21 13 3     187.5
1 21 13 4       500
1 21 13 4       500
1 21 13 4       300
1 21 13 4       300
1 21 14 .         0
1 21 15 4        28
1 21 15 4        .6
1 21 16 .         0
1 21 17 2         1
1 21 17 2         1
1 21 18 .         0
1 22  1 .         0
1 22  2 1 15.208208
1 22  2 2 30.416416
1 22  2 2 30.416416
1 22  3 .         0
1 22  4 1      54.6
1 22  4 1     124.2
1 22  4 1      57.6
1 22  4 1       138
1 22  4 1 36.399998
1 22  5 1    64.375
1 22  5 1    64.375
1 22  5 1     57.68
1 22  5 4       100
1 22  6 3        28
1 22  6 3     27.95
1 22  6 3        40
1 22  6 3      55.9
1 22  7 4        90
1 22  8 .         0
1 22  9 .         0
1 22 10 1 26.614365
1 22 10 2       2.5
1 22 11 1      5.25
1 22 11 1         7
1 22 12 .         0
1 22 13 3     187.5
1 22 13 3     187.5
1 22 13 4       500
1 22 13 4       300
1 22 13 4       500
1 22 14 .         0
1 22 15 3  .1520821
1 22 15 4  19.01026
1 22 15 4  .3802052
1 22 15 4  3.802052
1 22 16 1     123.2
1 22 17 2         1
1 22 17 2         1
1 22 18 .         0
1 21  1 .         0
2 21  2 1  24.14001
2 21  2 2  48.28003
2 21  2 6  48.28003
2 21  3 .         0
2 21  4 .         0
2 21  5 1      37.5
2 21  6 3      47.5
2 21  6 3      47.5
2 21  7 4       100
2 21  7 4        90
2 21  8 .         0
2 21  9 .         0
2 21 10 0  42.24502
2 21 10 1       1.8
2 21 11 1         8
2 21 11 1         8
2 21 11 1         8
2 21 12 .         0
2 21 13 3       150
2 21 13 3     112.5
2 21 13 3       150
2 21 13 4       500
2 21 13 4      1000
2 21 13 4       500
2 21 13 4       500
end


I tried using:

Code:
fillin id int_num group subgroup1
replace cons_qty = 0 if _fillin
but the problem is that it creates 8 subgroups for each group (8 is the highest number of subgroups inside a group in my dataset), independently of the group concerned.

I saw that the option by() is not allowed for fillin and I also tried using if to distinguish between the different groups, it didn't work either. I checked mipolate but as I understood, it doesn't do what I am looking for.

My solution would be to use fillin on subsets of my data where I keep only one group and append all the subsets afterwards. Yet, it doesn't seem like the most efficient way to do it. Would you have another approach to suggest?

No comments:

Post a Comment