I am willing to get ride of duplicate observations from my dataset. I have identified the duplicate using:
"sort zip
quietly by zip: gen dup = cond(_N==1,0,_n)
tab dup"
zip is the variable that I am using as identifier.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str6 zip double(arson2010 assault2010 auto_theft2010 burglary2010 murder2010 rape2010 robbery2010 theft2010 arson2019 assault2019 auto_theft2019 burglary2019 murder2019 rape2019 robbery2019 theft2019) "" 0 2 1 2 0 1 1 10 0 0 0 0 0 0 0 0 "" . 0 0 0 0 10 0 9 . 0 0 0 0 0 0 0 "" . 414 459 573 30 65 299 12523 . 3 0 1 1 1 10 23 "" . . . . . . . . 0 6 22 9 0 . 5 264 "0" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "05713" . 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 "05719" . 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 "08063" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 1 "08065" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 "08102" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 2 "09080" . . . . . . . . 0 0 0 0 0 0 0 0 "1" . . . . . . . . 0 0 0 0 0 0 0 0 "1" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "1050" . . . . . . . . 0 1 0 1 0 0 0 0 "11" . . . . . . . . 0 0 0 0 0 0 0 0 "111" . . . . . . . . 0 0 0 0 0 0 0 0 "12" . . . . . . . . 0 0 0 0 0 0 0 0 "12005" . . . . . . . . 0 0 0 0 0 0 0 0 "1229" . . . . . . . . 0 0 0 1 0 0 0 0 "14233" 0 9 9 7 1 0 8 71 1 3 3 2 1 0 5 46 "15020" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15090" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "15101" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "15104" 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "15106" 0 1 1 6 0 0 0 2 0 0 1 0 0 0 0 1 "15110" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15112" 0 0 0 0 0 0 0 7 0 0 0 2 0 0 0 3 "15120" 0 0 2 1 0 0 0 13 0 1 1 1 0 0 0 14 "15122" 0 1 0 1 0 0 0 11 0 0 1 2 0 0 0 0 "15126" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15129" 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 1 "15132" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "15136" 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 "15137" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15143" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15146" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15147" 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 2 "15201" 3 33 46 102 0 0 37 193 0 7 25 29 0 0 3 151 "15202" 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 1 "15203" 0 80 62 151 1 0 67 586 1 45 65 69 1 0 47 358 "15204" 2 34 41 102 2 0 30 135 1 17 14 20 2 0 16 97 "15205" 0 19 20 61 1 0 8 90 0 5 20 12 0 0 6 41 "15206" 2 110 167 269 10 0 150 778 2 53 90 108 2 0 46 441 "15207" 0 30 26 86 1 0 20 118 1 12 15 44 1 0 10 113 "15208" 4 70 62 161 5 0 65 220 5 54 62 71 5 0 37 183 "15210" 2 129 93 347 4 0 108 441 1 82 103 138 1 0 72 405 "15211" 1 30 39 154 0 0 25 356 0 16 38 56 1 0 13 159 "15212" 5 127 86 311 2 0 119 486 5 61 62 118 1 0 47 443 "15213" 0 27 64 214 0 0 64 449 1 17 27 76 0 0 25 263 "15214" 2 63 34 170 4 0 27 155 1 35 41 47 2 0 24 116 "15215" 0 0 1 0 0 0 0 16 0 1 2 0 0 0 3 25 "15216" 0 17 25 57 1 0 10 142 0 15 17 16 0 0 5 123 "15217" 1 8 40 144 0 0 33 349 1 8 28 85 0 0 10 250 "15218" 0 0 2 6 0 0 2 37 0 0 4 3 0 0 1 30 "15219" 2 91 68 158 6 0 96 453 1 64 63 78 6 0 43 280 "15220" 3 12 17 46 2 0 23 137 0 7 5 14 0 0 7 50 "15221" 4 23 22 63 2 0 23 98 0 19 23 40 1 0 20 137 "15222" 1 30 20 58 0 0 52 461 1 26 20 23 0 0 54 320 "15223" 0 6 5 14 0 0 17 21 0 1 2 1 0 0 2 7 "15224" 6 36 47 95 1 0 59 348 2 21 32 36 1 0 24 186 "15226" 0 25 25 44 0 0 14 138 3 21 21 39 0 0 11 111 "15227" 1 9 35 48 0 0 12 78 0 7 9 8 0 0 6 40 "15228" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "15229" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "15231" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15232" 0 12 34 43 0 0 21 383 0 12 16 42 0 0 13 182 "15233" 0 16 11 29 0 0 12 80 0 9 9 14 0 0 10 50 "15234" 0 2 3 15 0 0 2 11 0 3 2 4 0 0 1 10 "15235" 0 4 0 10 0 0 6 6 0 1 5 1 0 0 2 3 "15236" 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 "15237" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 "15238" 0 0 3 0 0 0 3 24 0 0 0 0 0 0 0 2 "15240" 0 1 1 2 0 0 2 11 0 0 1 1 0 0 0 5 "15243" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15260" 0 0 1 0 0 0 0 3 0 0 0 0 0 0 0 6 "15262" 1 1 4 0 0 0 2 24 0 2 4 0 0 0 1 11 "15275" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15282" 0 2 3 12 0 0 3 22 1 0 2 0 0 0 1 11 "15283" 0 0 1 2 0 0 0 4 0 0 1 1 0 0 1 2 "15290" 0 2 6 6 0 0 2 12 0 2 5 4 0 0 2 11 "15320" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15401" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15419" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15423" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "15522" 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 "15644" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "16" . . . . . . . . 0 0 0 0 0 0 0 1 "16059" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "16127" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "16316" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "19004" 0 0 0 0 0 . 0 0 0 5 3 6 0 . 4 37 "19006" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 "19012" 0 0 0 0 0 . 0 0 1 5 0 5 0 . 4 17 "19016" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 5 "19025" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 1 "19027" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 "19031" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 "19032" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 "19038" 0 0 0 0 0 . 0 0 0 0 0 2 0 . 0 2 "19046" 0 0 0 0 0 . 0 0 0 0 0 0 0 . 0 0 end
For each observations I have several variable that measure the number of crimes for different type of crimes for the period 2010-2019 (not all the variables are shown in the example)
Instead of get rid of duplicate observations I am willing to sum each variable for each duplicate observations, do you have any suggestion about a code to do this?
thank you in advance.
BR,
ac
0 Response to Sum duplicate observations
Post a Comment