Hi Users,
I just got access to a dataset (M) which I need to merge with another data based on id and year. The problem I am having is with dataset M. The way the data is presented is slightly odd to me. Please see below, any idea how I can get in a long (more concise) format. Please see the data below:
clear
input double id long itemo byte category str2 bulkbilling double(fee benefit) float year
700001 23 1 "P" 28 20.85 1996
700001 23 1 "D" 20.85 20.85 1996
700001 73901 6 "D" 6.7 6.7 1996
700001 73053 6 "D" 15.15 15.15 1996
700001 23 1 "D" 20.85 20.85 1996
700001 23 1 "D" 20.85 20.85 1996
700001 23 1 "D" 20.85 20.85 1996
700001 23 1 "D" 20.85 20.85 1996
700001 23 1 "P" 28 20.85 1997
700001 73901 6 "P" 10.55 6.7 1997
700001 73055 6 "P" 23.9 15.15 1997
700001 17603 3 "P" 38.05 26.75 1997
700001 18109 3 "P" 151.95 106.75 1997
700001 23 1 "P" 31.5 20.85 1997
700001 23 1 "P" 31.5 20.85 1997
700001 23 1 "P" 31.5 20.85 1997
700001 23 1 "P" 30 20.85 1997
700001 23 1 "P" 27 21 1998
700001 23 1 "P" 30 21 1998
700001 53 1 "P" 30 17.85 1999
700001 23 1 "P" 31.5 21.5 1999
700001 36 1 "P" 46 38.85 1999
700001 73901 6 "P" 10.65 6.8 1999
700001 73053 6 "P" 23.9 15.15 1999
700001 23 1 "P" 30 21.5 1999
700001 104 1 "P" 70 55.15 1999
700001 73905 6 "P" 8.6 6.7 1999
700001 72823 6 "P" 103.85 80.75 1999
700001 35608 3 "P" 50 41.2 1999
700001 105 1 "P" 38 27.65 1999
700001 73901 6 "P" 10.65 6.8 1999
700001 73055 6 "P" 23.9 15.15 1999
700001 23 1 "P" 30 21.5 1999
700001 73901 6 "P" 9.1 6.8 1999
700001 73055 6 "P" 20.35 15.15 1999
700001 23 1 "P" 30 21.5 1999
700001 104 1 "P" 70 55.15 1999
700001 105 1 "D" 28.05 28.05 1999
700001 23 1 "P" 32 22.5 2000
700001 73901 6 "P" 14.35 6.8 2000
700001 73055 6 "P" 33.2 15.75 2000
700001 23 1 "P" 32 22.5 2000
700001 104 1 "P" 85 55.95 2000
700001 35614 3 "P" 62.9 41.75 2000
700001 73905 6 "P" 8.2 6.8 2000
700001 72824 6 "P" 140.9 117.3 2000
700001 105 1 "D" 28.05 28.05 2000
700001 105 1 "P" 32.85 28.05 2000
700001 35608 3 "P" 49 41.85 2000
700001 23 1 "P" 32 22.5 2000
700001 10900 1 "D" 46.1 46.1 2000
700001 105 1 "P" 41 28.4 2000
700001 35614 3 "P" 42.25 42.25 2000
700001 73901 6 "P" 6.8 6.8 2000
700001 73055 6 "P" 15.75 15.75 2000
700001 105 1 "P" 41 28.4 2001
700001 23 1 "P" 32 22.95 2001
700001 23 1 "P" 32 22.95 2001
700001 73806 6 "P" 10 8.45 2001
700001 23 1 "P" 32 22.95 2001
700001 105 1 "P" 41 28.4 2001
700001 73901 6 "P" 14.2 6.8 2001
700001 73055 6 "P" 33.75 16.15 2001
700001 35614 3 "P" 42.25 42.25 2001
700001 23 1 "P" 35 23.45 2001
700001 73907 6 "P" 20.8 14.4 2001
700001 66716 6 "P" 30.4 21 2001
700001 65070 6 "P" 20.55 14.2 2001
700001 66533 6 "P" 23.6 16.35 2001
700001 23 1 "P" 35 23.45 2001
700001 23 1 "D" 23.45 23.45 2001
700001 23 1 "P" 35 23.45 2001
700001 73907 6 "P" 22.95 14.4 2001
700001 66536 6 "P" 14.75 9.3 2001
700001 66503 6 "P" 15.45 9.7 2001
700001 23 1 "P" 35 24.45 2001
700001 23 1 "P" 35 24.45 2002
700001 36 1 "P" 53 46.45 2002
700001 73901 6 "P" 11.75 6.8 2002
700001 73055 6 "P" 27.95 16.15 2002
700001 63856 5 "P" 509.4 419.4 2002
700001 23 1 "P" 35 24.45 2002
700001 23 1 "P" 35 24.45 2002
700001 73806 6 "P" 12 8.45 2002
700001 23 1 "P" 38 24.45 2002
700001 23 1 "P" 38 24.45 2002
700001 73806 6 "P" 12 8.45 2002
700001 23 1 "P" 38 24.45 2002
700001 73907 6 "P" 22.95 14.4 2002
700001 66536 6 "P" 14.75 9.3 2002
700001 66503 6 "P" 15.45 9.7 2002
700001 10918 1 "D" 23.95 23.95 2002
700001 23 1 "P" 38 24.45 2002
700001 23 1 "P" 38 24.45 2002
700001 73901 6 "P" 6.8 6.8 2002
700001 73053 6 "P" 16.15 16.15 2002
700001 23 1 "P" 38 24.45 2002
700001 55731 5 "P" 143.3 83.3 2002
700001 23 1 "P" 34 25.05 2003
700001 23 1 "P" 40 25.05 2003
it follows like this for another 4000 id's. I know this because of the variable itemo. Itemo and the corresponding itemo date is the reason I have so many row for id 70001 for the year 2002 and so on. Is there some way I can only have one row for 70001 for the year 2002 and so on something like this (example from the dataset I want to merge it with) :
700001 1996 .
700001 2000 .
700001 2003 .
700001 2006 .
700001 2009 .
700001 2012 .
700001 2015 .
700001 2018 .
700002 1996 .
700002 2000 948.75
700002 2003 938.82
700002 2006 867.62
700002 2009 867.62
700002 2012 .
700002 2015 1008
700002 2018 .
700003 1996 .
700003 2000 934.5
700003 2003 859.1
700003 2012 .
700003 2015 .
700003 2018 .
700004 1996 .
700004 2000 912.89
700004 2003 .
700004 2006 931.63
700004 2009 .
700004 2012 .
700004 2015 920
700004 2018 .
700005 1996 .
700005 2000 950.17
700005 2003 .
700005 2006 .
700005 2009 .
700005 2012 .
700005 2015 .
700005 2018 .
700006 1996 .
700006 2000 981.25
700006 2003 1090.42
700006 2006 1043.86
700006 2009 1043.86
700006 2012 1068
700006 2015 1068
700006 2018 .
700007 1996 .
700007 2000 1134.57
700007 2003 1158.6
700007 2006 1074.6
700007 2009 807.08
700007 2012 843
700007 2015 843
700007 2018 .
700008 1996 .
700008 2000 980.58
700008 2003 1145.26
700008 2009 .
700008 2012 .
700008 2015 .
700008 2018 .
700009 1996 .
700009 2000 989.7
700009 2003 1055.62
700009 2006 1143.29
700009 2009 .
700009 2012 1129
700009 2015 1109
700009 2018 .
700010 1996 .
700010 2000 957.75
700010 2003 988.5
700010 2006 1013.27
700010 2009 .
700010 2012 .
700010 2015 1069
700010 2018 .
700011 1996 .
700011 2000 1007.33
700011 2003 965.01
700011 2006 1089.31
700011 2009 990.9
700011 2012 1014
700011 2015 1014
700011 2018 .
700012 1996 .
700012 2000 946.78
700012 2003 1042.89
700012 2006 1033.36
700012 2009 1033.36
700012 2012 952
700012 2015 .
700012 2018 .
700014 1996 .
700014 2000 .
700014 2003 .
700014 2006 .
700014 2009 .
700014 2012 .
700014 2015 .
end
[/CODE]
------------------ copy up to and including the previous line ------------------
Listed 100 out of 9725669 observations
Use the count() option to list more
My end goal is to merge the two dataset. Any advice would be much appreciated. Thanks.
Kind Regards,
Aarushi
0 Response to Merging datasets
Post a Comment