I am currently dealing with cross-sectional studies derived from the Global Entrepreneurship Monitor (GEM) from 2004 to 2016. This study investigates the entrepreneurial activity across countries based on individual surveys. I am using a multi-level research design as I take the individual-level data from GEM and merge them with country-level factors such as corruption, the competitiveness of a country, etc. I investigate the country-level factors' impact on certain individual-level factors. My question relates to the cleaning phase.
After I selected my variables from GEM and appended the separate datasets from 2004 to 2016, I arrived at more than 2 million observations. In my initial dataset for the individual factors, I have around 16 variables and most of the observations are not complete, implying missing values. If I only keep the complete observations, I am left with approximately 9 000 observations. However, as I mentioned above, I would like to merge the individual-level data with country-level variables to obtain country-level effects. As a result of deleting the incomplete observations, some countries dropped out of certain years in the observations. For example, Germany is not presented in 2006 and 2007. Indeed, only a few countries (approx. 5) have complete data in all the examined years (2004-2016). This arose the question of whether I can still investigate differences between country despite the fact that not all the countries are completly presented in the dataset. Do you have anny suggestions how to overcome this issue?
This is the -datex- before dropping out the missing values:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(country yrsurv gemhhinc gemeduc omexport omnowjob gender age estbbuso knowenyy suskilyy frfailyy teayyopp teayynec eb_cust eb_tech eb_yytec eb_jobgr) 1 2016 33 1212 . . 2 63 0 0 1 1 0 0 . . -2 . 1 2016 68100 1316 . . 1 52 0 0 1 1 0 0 . . -2 . 1 2016 3467 1316 . . 2 64 0 0 0 0 0 0 . . -2 . 1 2016 3467 1316 6 2 2 70 1 0 0 0 0 0 2 3 0 0 1 2016 3467 1316 . . 1 -2 0 0 0 1 0 0 . . -2 . end label values country country label def country 1 "United States", modify label values gemhhinc GEMHHINC label def GEMHHINC 33 "Lowest 33%tile", modify label def GEMHHINC 3467 "Middle 33%tile", modify label def GEMHHINC 68100 "Upper 33%tile", modify label values gemeduc GEMEDUC label def GEMEDUC 1212 "SECONDARY DEGREE", modify label def GEMEDUC 1316 "POST SECONDARY", modify label values omexport omexport label def omexport 6 "10% or less", modify label values omnowjob omnowjob label values gender gender label def gender 1 "Male", modify label def gender 2 "Female", modify label values age age label def age -2 "Refused", modify label values estbbuso ESTBBUSO label def ESTBBUSO 0 "No", modify label def ESTBBUSO 1 "Yes", modify label values knowenyy KNOWENyy label def KNOWENyy 0 "No", modify label values suskilyy SUSKILyy label def SUSKILyy 0 "No", modify label def SUSKILyy 1 "Yes", modify label values frfailyy FRFAILyy label def FRFAILyy 0 "No", modify label def FRFAILyy 1 "Yes", modify label values teayyopp TEAyyOPP label def TEAyyOPP 0 "No", modify label values teayynec TEAyyNEC label def TEAyyNEC 0 "No", modify label values eb_cust EB_CUST label def EB_CUST 2 "Some", modify label values eb_tech EB_TECH label def EB_TECH 3 "No new technology (more than 5 years)", modify label values eb_yytec EB_yyTEC label def EB_yyTEC 0 "No/low technology sector", modify
This is the -datex- after dropping out the missing values:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(country yrsurv gemhhinc gemeduc omexport omnowjob gender age estbbuso knowenyy suskilyy frfailyy teayyopp teayynec eb_cust eb_tech eb_yytec eb_jobgr) 1 2016 68100 1212 6 1 2 36 1 0 1 0 1 0 3 3 0 0 1 2016 68100 1316 6 3 2 64 1 0 1 0 1 0 3 3 0 0 1 2016 68100 1316 6 0 2 59 1 1 1 0 1 0 2 3 0 0 1 2016 3467 1720 5 0 1 55 1 1 1 1 1 0 3 3 0 0 1 2016 68100 1316 6 24 1 66 1 1 1 0 1 0 3 3 0 6 end label values country country label def country 1 "United States", modify label values gemhhinc GEMHHINC label def GEMHHINC 3467 "Middle 33%tile", modify label def GEMHHINC 68100 "Upper 33%tile", modify label values gemeduc GEMEDUC label def GEMEDUC 1212 "SECONDARY DEGREE", modify label def GEMEDUC 1316 "POST SECONDARY", modify label def GEMEDUC 1720 "GRAD EXP", modify label values omexport omexport label def omexport 5 "11 to 25%", modify label def omexport 6 "10% or less", modify label values omnowjob omnowjob label values gender gender label def gender 1 "Male", modify label def gender 2 "Female", modify label values age age label values estbbuso ESTBBUSO label def ESTBBUSO 1 "Yes", modify label values knowenyy KNOWENyy label def KNOWENyy 0 "No", modify label def KNOWENyy 1 "Yes", modify label values suskilyy SUSKILyy label def SUSKILyy 1 "Yes", modify label values frfailyy FRFAILyy label def FRFAILyy 0 "No", modify label def FRFAILyy 1 "Yes", modify label values teayyopp TEAyyOPP label def TEAyyOPP 1 "Yes", modify label values teayynec TEAyyNEC label def TEAyyNEC 0 "No", modify label values eb_cust EB_CUST label def EB_CUST 2 "Some", modify label def EB_CUST 3 "None", modify label values eb_tech EB_TECH label def EB_TECH 3 "No new technology (more than 5 years)", modify label values eb_yytec EB_yyTEC label def EB_yyTEC 0 "No/low technology sector", modify
Thank you so much for your help and time!
0 Response to Cleaning data of Globar Entrepreneurship Monitor
Post a Comment