Hi statalist,
I have a dataset with approx 1.000.000 observations. In my dataset I have a lot of duplicates in my observations, and I only want to keep one observation for each id. The way I want it done, is so that related to one id is some information and I only want to keep the information that is most used for the first three digitals. An example could be as seen from data below that the first row should be combined to FO4D15 278 since it is the most used because of F04 is in the dataset twice. I hope it makes sense. Do you have a suggesting on have to solve this in Stata?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 information int id
"F04D15" 278
"F04D13" 278
"H02P21" 278
"H01P21" 278
"C12Y304" 1248
"A61K38" 1248
"C12N9" 1248
"C12N9" 1248
"C12Y304" 1271
"Y10S514" 1271
"A61K 38/00" 1271
"C12Y304" 1271
end
[/CODE]
Thanks in advance
Related Posts with Combining observations
gen var1=var2, generates some wrong values Code: gen geo3= geo3_bd2001 gives the following values. I wonder what might cause this. Any idea pl…
Is there a good way to test whether explanatory power among variables varies across populations?Let's imagine I want to assess whether the explanatory power of daily calories, minutes of exercise,…
Teffect psmatch with sample weightsI am using propensity score matching on data that requires using sample weights. The data comes from…
esttab summary stats tableHello, I am using estout package to create my summary stats table. I only want the stats for one v…
Why might Clopper Pearson CIs differ in Stata vs SAS output?Hello, I am currently using Stata to replicate an analysis with complex svy data to obtain weighted…
Subscribe to:
Post Comments (Atom)
0 Response to Combining observations
Post a Comment