Hi statalist,
I have a dataset with approx 1.000.000 observations. In my dataset I have a lot of duplicates in my observations, and I only want to keep one observation for each id. The way I want it done, is so that related to one id is some information and I only want to keep the information that is most used for the first three digitals. An example could be as seen from data below that the first row should be combined to FO4D15 278 since it is the most used because of F04 is in the dataset twice. I hope it makes sense. Do you have a suggesting on have to solve this in Stata?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 information int id
"F04D15" 278
"F04D13" 278
"H02P21" 278
"H01P21" 278
"C12Y304" 1248
"A61K38" 1248
"C12N9" 1248
"C12N9" 1248
"C12Y304" 1271
"Y10S514" 1271
"A61K 38/00" 1271
"C12Y304" 1271
end
[/CODE]
Thanks in advance
Related Posts with Combining observations
invalid 'Mylastname' while executing foreachDear Statalist,
I am working with an around 30 years' data from an annual survey. Each year-wave is…
rename variable with ChineseDear All, I have this data (with simplified Chinese variable name) to be reshaped.
Code:
* Example…
egen rowtotal VS. genHi Statalist,
Could you please explain it to me why the following commands don't have the same resu…
outreg2 add columns with addstatAfter a logit regression, I am doing a bunch of estimations, and outputting it all with outreg2
Cod…
Counting frequency of specific and missing stringsI have a series of variables with dates in string format, including a mis of M/DD/YY, MM-DD-YY, MM/D…
Subscribe to:
Post Comments (Atom)
0 Response to Combining observations
Post a Comment