Hi statalist,
I have a dataset with approx 1.000.000 observations. In my dataset I have a lot of duplicates in my observations, and I only want to keep one observation for each id. The way I want it done, is so that related to one id is some information and I only want to keep the information that is most used for the first three digitals. An example could be as seen from data below that the first row should be combined to FO4D15 278 since it is the most used because of F04 is in the dataset twice. I hope it makes sense. Do you have a suggesting on have to solve this in Stata?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 information int id
"F04D15" 278
"F04D13" 278
"H02P21" 278
"H01P21" 278
"C12Y304" 1248
"A61K38" 1248
"C12N9" 1248
"C12N9" 1248
"C12Y304" 1271
"Y10S514" 1271
"A61K 38/00" 1271
"C12Y304" 1271
end
[/CODE]
Thanks in advance
Related Posts with Combining observations
Cluster analysis with longitudinal dataAre there commands in Stata that are similar to the kml3d package in R (see e.g. Genolini et al 2015…
Problem with Difference in difference specificationHi all, I am sorry if this topic has been covered before. However, I did my research but failed. I…
Generate a week from a data and "xtset" the dataDear Stata Users, I have the sample of data attached below. I need to create a “week” date that goes…
Elasticity (margins: eyex) with Generalized Linear Model?Dear, I am estimating temporal changes between cohorts in the association (in terms of elasticity) …
DCC Graph correlationsHello. I have a basic question with predict correlation using MGARCH DCC models. Why initial values …
Subscribe to:
Post Comments (Atom)
0 Response to Combining observations
Post a Comment