Hello,
I'm very new to Stata and am trying to complete some data cleaning. I have a dataset with 5 variables and around 200 million observations. The variables are all numeric, and I would like to check that three of them have been encoded correctly, as they were originally categorical (string) variables. For example, I would like to know if the numerical code captures distinct countries for the country variable (there may be typos in the original categories, for instance).
The original string variables are not available, but Stata shows the country names in browse (the categorical variable), but treats the variable as numeric in the data editor. Is there any way to check what the equivalencies between the two are?
Thank you in advance for any help you might be able to give me!
Best wishes,
Clara
Related Posts with Data cleaning - checking correct encoding of variables
How to conduct multiple impulation when all variables have missing values?Dear all, I am working with my resident survey data (n=8356) including 59 items, most of which are …
Strong Correlation of Variables in Spearmans'rho MatrixHello Just before doing a panel regression, I wanted to look at the correlation matrix for the desc…
create a complex macroDear all, I want to get three models like this: logistic cancer var1 logistic cancer var2 logistic c…
SEM Builder: Wrong modelHi, I am trying to run a first-order four-factor model in Stata 15 SEM Builder: Array However, the …
The krls command and panel data regressionsHello Forum members, I have come across the krls command for Kernel-Based Regularized Least Squares…
Subscribe to:
Post Comments (Atom)
0 Response to Data cleaning - checking correct encoding of variables
Post a Comment