Dear researchers,

I am interested in studying specific factors across countries for the period from 2000-2019. The population (i.e. name of countries) has been identified from one database but due to the unavailability of data; the variables’ data for these countries has been collected from different data sources, where some of these data sources includes data for specific number of countries (i.e. not all of them), and some of the data sources have data for specific variables (i.e. not all of them) for specific years. While, other data sources have data for specific variables since 2000 until 2019. So, after importing the data set to the STATA, it shows me that the data is balanced and I could justify this as I could have country with missing values for specific variables in specific year and meanwhile I could have data for the same country and year with different variables.

But I think that I should pool the data instead of using the panel as I have collected the data from different sources and the availability of the variables in years is vary across databases. If this make sense, is there any paper or reference that support the idea of pooling the data if it is collected from different databases?

Many thanks in advance.