Hi all,

I have data on certain variables for four rounds. Following are the number of observations in each of the rounds:
Code:
tab round

      round |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |      1,950       25.34       25.34
          3 |      1,931       25.09       50.43
          4 |      1,915       24.88       75.31
          5 |      1,900       24.69      100.00
------------+-----------------------------------
      Total |      7,696      100.00
I have round specific variables hs_child_2, hs_child_3,hs_child_4,hs_child_5 and one variable for the panel based on the 4 round specific variables. The panel variable is hs_child, which has been obtained as follows:
Code:
gen hs_child=hs_child_2==1|hs_child_3==1|hs_child_4==1|hs_child_5==1 if round==2|round==3|round==4|round==5
The problem I am facing is there is a discrepancy in the number of observations for the round specific variables and the constructed panel variable, as shown below:
Code:
tab hs_child_2

hs_child_2 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,378       70.67       70.67
          1 |        572       29.33      100.00
------------+-----------------------------------
      Total |      1,950      100.00

tab hs_child_3

hs_child_3 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,565       81.09       81.09
          1 |        365       18.91      100.00
------------+-----------------------------------
      Total |      1,930      100.00


tab hs_child_4

hs_child_4 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,309       68.36       68.36
          1 |        606       31.64      100.00
------------+-----------------------------------
      Total |      1,915      100.00


tab hs_child_5

hs_child_5 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,137       60.06       60.06
          1 |        756       39.94      100.00
------------+-----------------------------------
      Total |      1,893      100.00


tab hs_child

hs_child |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      5,397       70.13       70.13
          1 |      2,299       29.87      100.00
------------+-----------------------------------
      Total |      7,696      100.00
As can be seen, number of children covered in each round, under this particular question, is less than the number of observations for that round on two occassions- Round 3 (total 1931, children-1930) and Round 5 (Total 1900, children-1893). Now, going by the numbers on round specific variables, total number of observations in the category 0 sums to 5389, but for the panel variable, number of observations in this category is 5397.

Is this happening because the missing values are being coded as zero when I am constructing the panel variable? Is it justified to keep the missing values as zero?

Any help in this regard would be greatly appreciated.

Regards,
Titir