Dear Statalist members,

I am working with data about nutritional intake where participants describe what they eat during two consecutive days. Each food is entered as a new observation in the database and is categorised by a food group and subgroup, which appear as the variables "group" and "subgroup1". There are 18 groups, coded from 1 to 18. A quantity ("cons_qty") is also associated with every food consumed.

I successfully used fillin as suggested by Clyde Schechter to create new observations for groups when these observations did not exist in the database (see:https://www.statalist.org/forums/for...-in-a-database). I would have another question related to this.

Each group is further divided into subgroups. The different groups don't have the same number of subgroups, for example group 4 has 5 subgroups whereas group 7 has 8 subgroups. I want to create new observations for subgroups that were not consumed, in the same idea that for the groups.

Here is an example of the dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id byte(int_num group subgroup1) float cons_qty
1 21  1 .         0
1 21  2 0       261
1 21  2 1        32
1 21  2 2  132.1875
1 21  2 2      76.8
1 21  2 2       111
1 21  2 3      40.5
1 21  3 .         0
1 21  4 1       207
1 21  4 1       138
1 21  5 1    64.375
1 21  5 1    64.375
1 21  5 5      21.8
1 21  5 5       125
1 21  6 .         0
1 21  7 .         0
1 21  8 .         0
1 21  9 .         0
1 21 10 1      25.2
1 21 11 2        10
1 21 12 2        10
1 21 13 3     187.5
1 21 13 3     187.5
1 21 13 4       500
1 21 13 4       500
1 21 13 4       300
1 21 13 4       300
1 21 14 .         0
1 21 15 4        28
1 21 15 4        .6
1 21 16 .         0
1 21 17 2         1
1 21 17 2         1
1 21 18 .         0
1 22  1 .         0
1 22  2 1 15.208208
1 22  2 2 30.416416
1 22  2 2 30.416416
1 22  3 .         0
1 22  4 1      54.6
1 22  4 1     124.2
1 22  4 1      57.6
1 22  4 1       138
1 22  4 1 36.399998
1 22  5 1    64.375
1 22  5 1    64.375
1 22  5 1     57.68
1 22  5 4       100
1 22  6 3        28
1 22  6 3     27.95
1 22  6 3        40
1 22  6 3      55.9
1 22  7 4        90
1 22  8 .         0
1 22  9 .         0
1 22 10 1 26.614365
1 22 10 2       2.5
1 22 11 1      5.25
1 22 11 1         7
1 22 12 .         0
1 22 13 3     187.5
1 22 13 3     187.5
1 22 13 4       500
1 22 13 4       300
1 22 13 4       500
1 22 14 .         0
1 22 15 3  .1520821
1 22 15 4  19.01026
1 22 15 4  .3802052
1 22 15 4  3.802052
1 22 16 1     123.2
1 22 17 2         1
1 22 17 2         1
1 22 18 .         0
1 21  1 .         0
2 21  2 1  24.14001
2 21  2 2  48.28003
2 21  2 6  48.28003
2 21  3 .         0
2 21  4 .         0
2 21  5 1      37.5
2 21  6 3      47.5
2 21  6 3      47.5
2 21  7 4       100
2 21  7 4        90
2 21  8 .         0
2 21  9 .         0
2 21 10 0  42.24502
2 21 10 1       1.8
2 21 11 1         8
2 21 11 1         8
2 21 11 1         8
2 21 12 .         0
2 21 13 3       150
2 21 13 3     112.5
2 21 13 3       150
2 21 13 4       500
2 21 13 4      1000
2 21 13 4       500
2 21 13 4       500
end


I tried using:

Code:
fillin id int_num group subgroup1
replace cons_qty = 0 if _fillin
but the problem is that it creates 8 subgroups for each group (8 is the highest number of subgroups inside a group in my dataset), independently of the group concerned.

I saw that the option by() is not allowed for fillin and I also tried using if to distinguish between the different groups, it didn't work either. I checked mipolate but as I understood, it doesn't do what I am looking for.

My solution would be to use fillin on subsets of my data where I keep only one group and append all the subsets afterwards. Yet, it doesn't seem like the most efficient way to do it. Would you have another approach to suggest?