Check if observations are missing at random in panel data

Hello everyone,

I have unbalanced panel data and I would like to solve the problem of missing values with a multiple imputation. If I am not wrong, the first step is to assess if data are missing at random. To do that I can either employ a logit model or a t-test. I have tried both without succeeding.
These are my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 country str10 date double(total_deaths_per_million total_cases_per_million) float country_system byte(miss_total_cases_per_million miss_total_deaths_per_million) double(life_expectancy hospital_beds_per_thousand aged_70_older)
"Argentina" "2020-04-09"  1.593  39.716 1 0 0 76.67 5 7.441
"Argentina" "2020-04-10"  1.814  43.699 1 0 0 76.67 5 7.441
"Argentina" "2020-04-11"  1.836  43.699 1 0 0 76.67 5 7.441
"Argentina" "2020-04-12"  1.991  47.394 1 0 0 76.67 5 7.441
"Argentina" "2020-04-13"  2.146  48.854 1 0 0 76.67 5 7.441
"Argentina" "2020-04-14"  2.257  50.381 1 0 0 76.67 5 7.441
"Argentina" "2020-04-15"  2.456  54.054 1 0 0 76.67 5 7.441
"Argentina" "2020-04-16"  2.544  56.886 1 0 0 76.67 5 7.441
"Argentina" "2020-04-17"  2.721  59.054 1 0 0 76.67 5 7.441
"Argentina" "2020-04-18"  2.854  61.023 1 0 0 76.67 5 7.441
"Argentina" "2020-04-19"  2.921  62.816 1 0 0 76.67 5 7.441
"Argentina" "2020-04-20"  3.009  65.072 1 0 0 76.67 5 7.441
"Argentina" "2020-04-21"  3.253  67.064 1 0 0 76.67 5 7.441
"Argentina" "2020-04-22"  3.363  69.564 1 0 0 76.67 5 7.441
"Argentina" "2020-04-23"  3.651  76.003 1 0 0 76.67 5 7.441
"Argentina" "2020-04-24"  3.894  79.808 1 0 0 76.67 5 7.441
"Argentina" "2020-04-25"  4.093  83.636 1 0 0 76.67 5 7.441
"Argentina" "2020-04-26"  4.248  86.114 1 0 0 76.67 5 7.441
"Argentina" "2020-04-27"  4.359   88.57 1 0 0 76.67 5 7.441
"Argentina" "2020-04-28"   4.58  91.314 1 0 0 76.67 5 7.441
"Argentina" "2020-04-29"  4.735   94.81 1 0 0 76.67 5 7.441
"Argentina" "2020-04-30"  4.823  97.974 1 0 0 76.67 5 7.441
"Argentina" "2020-05-01"  4.978 100.275 1 0 0 76.67 5 7.441
"Argentina" "2020-05-02"  5.244 103.572 1 0 0 76.67 5 7.441
"Argentina" "2020-05-03"  5.443 105.828 1 0 0 76.67 5 7.441
"Argentina" "2020-05-04"  5.753  108.13 1 0 0 76.67 5 7.441
"Argentina" "2020-05-05"  5.841 111.072 1 0 0 76.67 5 7.441
"Argentina" "2020-05-06"   6.04 115.232 1 0 0 76.67 5 7.441
"Argentina" "2020-05-07"   6.24 118.839 1 0 0 76.67 5 7.441
"Argentina" "2020-05-08"  6.483 124.149 1 0 0 76.67 5 7.441
"Argentina" "2020-05-09"  6.638   127.8 1 0 0 76.67 5 7.441
"Argentina" "2020-05-10"  6.748 133.508 1 0 0 76.67 5 7.441
"Argentina" "2020-05-11"  6.948 138.907 1 0 0 76.67 5 7.441
"Argentina" "2020-05-12"  7.058 145.213 1 0 0 76.67 5 7.441
"Argentina" "2020-05-13"  7.279 152.204 1 0 0 76.67 5 7.441
"Argentina" "2020-05-14"   7.81 157.847 1 0 0 76.67 5 7.441
"Argentina" "2020-05-15"  7.877  165.48 1 0 0 76.67 5 7.441
"Argentina" "2020-05-16"  8.032 172.693 1 0 0 76.67 5 7.441
"Argentina" "2020-05-17"  8.253 178.512 1 0 0 76.67 5 7.441
"Argentina" "2020-05-18"  8.452 185.216 1 0 0 76.67 5 7.441
"Argentina" "2020-05-19"  8.696 194.908 1 0 0 76.67 5 7.441
"Argentina" "2020-05-20"  8.917 205.395 1 0 0 76.67 5 7.441
"Argentina" "2020-05-21"  9.204 219.733 1 0 0 76.67 5 7.441
"Argentina" "2020-05-22"  9.581 235.619 1 0 0 76.67 5 7.441
"Argentina" "2020-05-23"  9.846 251.196 1 0 0 76.67 5 7.441
"Argentina" "2020-05-24" 10.001 267.193 1 0 0 76.67 5 7.441
"Argentina" "2020-05-25" 10.333 279.407 1 0 0 76.67 5 7.441
"Argentina" "2020-05-26" 10.709 292.682 1 0 0 76.67 5 7.441
"Argentina" "2020-05-27" 11.063 308.281 1 0 0 76.67 5 7.441
"Argentina" "2020-05-28"  11.24 325.296 1 0 0 76.67 5 7.441
"Argentina" "2020-05-29" 11.505  341.16 1 0 0 76.67 5 7.441
end

where total_cases_per_million and total_deaths_per_million are the dependent variables for which I would like to impute missing values, and miss_total_cases_per_million and miss_total_deaths_per_million are dummy variables that indicate observations for which the previous two variables are missing.

I tried the following code:

Code:

. xtlogit miss_total_cases_per_million country_system total_cases_per_million
>  stringency gdp_per_capita extreme_poverty aged_70_older
outcome does not vary; remember:
                                  0 = negative outcome,
        all other nonmissing values = positive outcome
r(2000);

end of do-file

I saw in previous posts that it is possible to estimate this by using a logistic model, however, I saw only examples in which the data were cross-sectional which is not the case here. Thus, I would like to ask if someone knows how to that and also if it is possible in this case. Thank you in advance to anyone who is willing to help.
Best regards

Alessio Lombini

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Check if observations are missing at random in panel data
Check if observations are missing at random in panel data

0 Response to Check if observations are missing at random in panel data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Check if observations are missing at random in panel data Check if observations are missing at random in panel data

Related Posts with Check if observations are missing at random in panel data

0 Response to Check if observations are missing at random in panel data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Check if observations are missing at random in panel data
Check if observations are missing at random in panel data