I wrote the following test code which checks that the total number of nomissing observations for each id is the maximum number of conditional observations, but it falls down if there is a date mismatch as in the last case. This case is invalid because there isn't any group which has valid data at both t=5 and t=8 (or the minimum and maximum date of the panel if used unconditionally). I can put a failure check which calculates the in sample min and max values of the date, but I wanted to ask if there was a more natural way to do this. I assumed that this is a somewhat common concern, but I don't have a good sense of the best way to approach it.
Code:
clear
input float(id date variable)
1 5 .88
1 6  .2
1 7 .89
2 5 .58
2 6 .37
2 7 .85
3 5 .39
3 6 .12
3 7   .
4 6  .7
4 7 .69
4 8 .93
end
capture program drop balanced
program define balanced
    syntax varlist [if], Generate(string)
    marksample touse
    tempvar obs balanced
    by id (date): gen `obs' = sum(`touse')
    qui sum `obs', meanonly
    local maxobs = `r(max)'
    qui by id (date): replace `touse' = 0 if `obs'[_N] != `maxobs'
    gen `generate' = `touse'
end
tsset id date
balanced variable if inrange(date,5,7), g(bal57)
balanced variable if inrange(date,5,6), g(bal67)
balanced variable if inrange(date,6,7), g(bal56)
balanced variable if inrange(date,5,8), g(bal58)  /// Produces an incorrect result, should probably be made to generate an errorCode:
     +------------------------------------------------------+
     | id   date   variable   bal57   bal67   bal56   bal58 |
     |------------------------------------------------------|
  1. |  1      5        .88       1       1       0       1 |
  2. |  1      6         .2       1       1       1       1 |
  3. |  1      7        .89       1       0       1       1 |
  4. |  2      5        .58       1       1       0       1 |
  5. |  2      6        .37       1       1       1       1 |
  6. |  2      7        .85       1       0       1       1 |
  7. |  3      5        .39       0       1       0       0 |
  8. |  3      6        .12       0       1       0       0 |
  9. |  3      7          .       0       0       0       0 |
 10. |  4      6         .7       0       0       1       1 |
 11. |  4      7        .69       0       0       1       1 |
 12. |  4      8        .93       0       0       0       1 |
     +------------------------------------------------------+
0 Response to Best way to mark a sample containing the balanced panel of observations with nonmissing data.
Post a Comment