Hi statalist,
I have a dataset with approx 1.000.000 observations. In my dataset I have a lot of duplicates in my observations, and I only want to keep one observation for each id. The way I want it done, is so that related to one id is some information and I only want to keep the information that is most used for the first three digitals. An example could be as seen from data below that the first row should be combined to FO4D15 278 since it is the most used because of F04 is in the dataset twice. I hope it makes sense. Do you have a suggesting on have to solve this in Stata?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 information int id
"F04D15" 278
"F04D13" 278
"H02P21" 278
"H01P21" 278
"C12Y304" 1248
"A61K38" 1248
"C12N9" 1248
"C12N9" 1248
"C12Y304" 1271
"Y10S514" 1271
"A61K 38/00" 1271
"C12Y304" 1271
end
[/CODE]
Thanks in advance
Related Posts with Combining observations
Combining bysort with a local for ylabelsHi all, I'm trying to iteratively build up a local to make ylabel markers for a graphing command i'…
Unbalanced panel data - Selection problemHi everyone, I'm studying "the impact of credit access on the performance of small and medium enter…
Markov Switching Model of the Business CycleHello everyone, Can anyone guide me about Markov Switching Model of the Business Cycle, I have the …
Questions regarding multi collinearity and centering of control variableDear all, I am running a Generalized Estimating Equations model and wanted to check whether there e…
Multiple observations per id, select id with highest value in other identifier and select corresponding other variables to that identifierDear statalists, I have a question with regard to organizing the data from long to wide and removin…
Subscribe to:
Post Comments (Atom)
0 Response to Combining observations
Post a Comment