Hi everyone,
I hope everything is well! I'm reaching out because I read some guidance on imputing but I still feel lost, and I was hoping some of you were familiar with the topic. I'm currently collecting the data for my thesis "The Impact of National Cybersecurity Commitment on Financial Inclusion". Basically, I'm looking how different cybersecurity measures taken by countries have impacted the adoption of digital financial services (Such as digital payments). My dependent variable comes from a dataset that has only been published in 2014, 2017 and 2021, so those are the years I will focus on. I got some control variables from the World Bank Data, and unfortunately there are many missing values for some countries (Especially for 2021). The control variables are the following:
ATMS per 100,000 people
GDP growth (annual %)
Domestic credit to private sector (% of GDP)
Average transaction cost of sending remittances to a specific country (%)
School enrollment, primary (% gross)
Mobile cellular subscriptions (per 100 people)
Given my number of observations (tbd but around 420), I was thinking of only filling 2021 values for those with 2020 data. For School enrollment, primary (% gross), since I think there is not a lot of difference from year to year, I was just going to replace 2021 values with 2020 values. However, for the rest I think it makes more sense to do imputation, but I'm not sure which one to use. I would really appreciate if you would guide me on this process.
I was looking at some of the options to impute, and it asks whether it's a completely random missing value or just random. It's weird in my case because we expect fewer data for low-income countries (which is a variable I have), so I'm not sure if that matters or if it means another, more formal relationship.
I have attached my dataset (currently with years from 2013 to 2021) and my do file (pretty short).
Thank you so much!
0 Response to Advice for Imputing my Control Variables
Post a Comment