I have unbalanced panel data and I would like to solve the problem of missing values with a multiple imputation. If I am not wrong, the first step is to assess if data are missing at random. To do that I can either employ a logit model or a t-test. I have tried both without succeeding.
These are my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str32 country str10 date double(total_deaths_per_million total_cases_per_million) float country_system byte(miss_total_cases_per_million miss_total_deaths_per_million) double(life_expectancy hospital_beds_per_thousand aged_70_older) "Argentina" "2020-04-09" 1.593 39.716 1 0 0 76.67 5 7.441 "Argentina" "2020-04-10" 1.814 43.699 1 0 0 76.67 5 7.441 "Argentina" "2020-04-11" 1.836 43.699 1 0 0 76.67 5 7.441 "Argentina" "2020-04-12" 1.991 47.394 1 0 0 76.67 5 7.441 "Argentina" "2020-04-13" 2.146 48.854 1 0 0 76.67 5 7.441 "Argentina" "2020-04-14" 2.257 50.381 1 0 0 76.67 5 7.441 "Argentina" "2020-04-15" 2.456 54.054 1 0 0 76.67 5 7.441 "Argentina" "2020-04-16" 2.544 56.886 1 0 0 76.67 5 7.441 "Argentina" "2020-04-17" 2.721 59.054 1 0 0 76.67 5 7.441 "Argentina" "2020-04-18" 2.854 61.023 1 0 0 76.67 5 7.441 "Argentina" "2020-04-19" 2.921 62.816 1 0 0 76.67 5 7.441 "Argentina" "2020-04-20" 3.009 65.072 1 0 0 76.67 5 7.441 "Argentina" "2020-04-21" 3.253 67.064 1 0 0 76.67 5 7.441 "Argentina" "2020-04-22" 3.363 69.564 1 0 0 76.67 5 7.441 "Argentina" "2020-04-23" 3.651 76.003 1 0 0 76.67 5 7.441 "Argentina" "2020-04-24" 3.894 79.808 1 0 0 76.67 5 7.441 "Argentina" "2020-04-25" 4.093 83.636 1 0 0 76.67 5 7.441 "Argentina" "2020-04-26" 4.248 86.114 1 0 0 76.67 5 7.441 "Argentina" "2020-04-27" 4.359 88.57 1 0 0 76.67 5 7.441 "Argentina" "2020-04-28" 4.58 91.314 1 0 0 76.67 5 7.441 "Argentina" "2020-04-29" 4.735 94.81 1 0 0 76.67 5 7.441 "Argentina" "2020-04-30" 4.823 97.974 1 0 0 76.67 5 7.441 "Argentina" "2020-05-01" 4.978 100.275 1 0 0 76.67 5 7.441 "Argentina" "2020-05-02" 5.244 103.572 1 0 0 76.67 5 7.441 "Argentina" "2020-05-03" 5.443 105.828 1 0 0 76.67 5 7.441 "Argentina" "2020-05-04" 5.753 108.13 1 0 0 76.67 5 7.441 "Argentina" "2020-05-05" 5.841 111.072 1 0 0 76.67 5 7.441 "Argentina" "2020-05-06" 6.04 115.232 1 0 0 76.67 5 7.441 "Argentina" "2020-05-07" 6.24 118.839 1 0 0 76.67 5 7.441 "Argentina" "2020-05-08" 6.483 124.149 1 0 0 76.67 5 7.441 "Argentina" "2020-05-09" 6.638 127.8 1 0 0 76.67 5 7.441 "Argentina" "2020-05-10" 6.748 133.508 1 0 0 76.67 5 7.441 "Argentina" "2020-05-11" 6.948 138.907 1 0 0 76.67 5 7.441 "Argentina" "2020-05-12" 7.058 145.213 1 0 0 76.67 5 7.441 "Argentina" "2020-05-13" 7.279 152.204 1 0 0 76.67 5 7.441 "Argentina" "2020-05-14" 7.81 157.847 1 0 0 76.67 5 7.441 "Argentina" "2020-05-15" 7.877 165.48 1 0 0 76.67 5 7.441 "Argentina" "2020-05-16" 8.032 172.693 1 0 0 76.67 5 7.441 "Argentina" "2020-05-17" 8.253 178.512 1 0 0 76.67 5 7.441 "Argentina" "2020-05-18" 8.452 185.216 1 0 0 76.67 5 7.441 "Argentina" "2020-05-19" 8.696 194.908 1 0 0 76.67 5 7.441 "Argentina" "2020-05-20" 8.917 205.395 1 0 0 76.67 5 7.441 "Argentina" "2020-05-21" 9.204 219.733 1 0 0 76.67 5 7.441 "Argentina" "2020-05-22" 9.581 235.619 1 0 0 76.67 5 7.441 "Argentina" "2020-05-23" 9.846 251.196 1 0 0 76.67 5 7.441 "Argentina" "2020-05-24" 10.001 267.193 1 0 0 76.67 5 7.441 "Argentina" "2020-05-25" 10.333 279.407 1 0 0 76.67 5 7.441 "Argentina" "2020-05-26" 10.709 292.682 1 0 0 76.67 5 7.441 "Argentina" "2020-05-27" 11.063 308.281 1 0 0 76.67 5 7.441 "Argentina" "2020-05-28" 11.24 325.296 1 0 0 76.67 5 7.441 "Argentina" "2020-05-29" 11.505 341.16 1 0 0 76.67 5 7.441 end
where total_cases_per_million and total_deaths_per_million are the dependent variables for which I would like to impute missing values, and miss_total_cases_per_million and miss_total_deaths_per_million are dummy variables that indicate observations for which the previous two variables are missing.
I tried the following code:
Code:
. xtlogit miss_total_cases_per_million country_system total_cases_per_million > stringency gdp_per_capita extreme_poverty aged_70_older outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome r(2000); end of do-file
Best regards
Alessio Lombini
0 Response to Check if observations are missing at random in panel data
Post a Comment