Hello,
I'm very new to Stata and am trying to complete some data cleaning. I have a dataset with 5 variables and around 200 million observations. The variables are all numeric, and I would like to check that three of them have been encoded correctly, as they were originally categorical (string) variables. For example, I would like to know if the numerical code captures distinct countries for the country variable (there may be typos in the original categories, for instance).
The original string variables are not available, but Stata shows the country names in browse (the categorical variable), but treats the variable as numeric in the data editor. Is there any way to check what the equivalencies between the two are?
Thank you in advance for any help you might be able to give me!
Best wishes,
Clara
Related Posts with Data cleaning - checking correct encoding of variables
Stored results for standard errorsHi, I am using probit regression and margins command thereafter. I am inquiring if the standard err…
Creating a new observation with specific "missing" valueGreetings! I have seen similar questions posted, but not close enough to what I need to figure out a…
How can I use control variables(dummy) in probit?Hello, I am working on my thesis and was my professor explained that for my dataset a probit analys…
Panel Probit/Logit Regression and time dummies- What to use?Hi all, I'm interested in running a probability model but I have some questions regarding with the …
Count number of time where at least x number of consecutive rows have the same valueHello, I have a large dataset which displays a status per time point. I want to count the number of…
Subscribe to:
Post Comments (Atom)
0 Response to Data cleaning - checking correct encoding of variables
Post a Comment