Dear community,

I hope it is alright that I am asking some quite basic questions. I have collected data on 169 countries from 1991-2019, where the unit of analysis is the country-year. The dataset captures data from different datasets, and each variable (24 in total) is an indicator of the concept "state capacity". Naturally, there is a lot of missingness as different datasets cover different years, countries, etc. I furthermore suspect that the missing data is MNAR (Missing not at random), as for instance missing data on Afghanistan's military spending or quality of bureaucracy probably has to with its (lacking) 'state capacity'.

I was originally planning on doing a factor analysis, but if I have understood correctly, that will be (almost) impossible and hardly appropriate with the amount of missingness I have. I have also thought about doing some kind of latent variable analysis (e.g. Bayesian latent variable analysis), but as I have never done that before, I would very much appreciate any comments on which 'type'/command would be possible and appropriate to do with missing data. Again, I'm not familiar with the latter and would appreciate any tips. Lastly, I wished to create a new "Capacity" variable, based on the results from the factor/latent variable analysis. By incorporating indicators of state capacity drawn from multiple sources, I would in other words like to provide annual measures of "Capacity" per country.

However, as I see it, I have two options: 1) cut severely down on countries and variables, to avoid missingness (and then run a factor analysis - but how do i even create a new country-year variable for these results if i manage to do the first?), or 2) do something completely different with the data. Does anyone have any tips for what I could do?

Thank you a lot in advance, and apologies for the non-concrete questions. Attached is my dataset.

Charlotte