Hi Statalists,

I am new to Stata and just encountered this issue in data management.

Suppose I have two datasets with different numbers of observations.

First, conditioning on these two datasets contain some common variables but different observations, I want combine these two datasets, remaining all the observations in these two datasets and leaving the unmatched observations as missing values. For example, both these two datasets have variables srcdate, gvkey, and cusip. These two datasets also contain their own but potentially different salecs (namely, salecs1 and salecs2). If these three variables (srcdate, gvkey, and cusip) can match, then we compare their salecs (salecs1 and salecs2) by listing them next to each other. This procedure can be done by generating a dummy variable, if matched, presented by 1, otherwise, presented by 0.

Second, if initially these three variables
(srcdate, gvkey, and cusip)
cannot match, just create a new row of observations and leave whoever's cell as missing values.

The result should contain both datasets entire information as well as comparing their observations respectively.

Here is the example for these two datasets.

Dataset 1

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long srcdate str6 gvkey str10 cusip str58 conm str4 sic str50 cnms str8 ctype double salecs
19904 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   15.127
20269 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.215
20635 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"    11.73
21000 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.936
21365 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.519
20819 "034066" "68247Q102" "111 INC -ADR"                "5960" "Not Reported"  "COMPANY"   6.997
21184 "034066" "68247Q102" "111 INC -ADR"                "5960" "3 Customers"   "COMPANY"  50.479
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  239.017
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"   272.78
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   16.077
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.039
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  325.093
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.942
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     43.5
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   17.883
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  286.772
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  367.812
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  292.328
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    7.548
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     62.7
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   18.528
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    7.876
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   19.332
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     72.5
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  388.787
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  300.004
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  376.865
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.522
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  540.678
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   20.832
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     76.8
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  388.049
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  587.074
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"   10.001
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     90.3
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   15.002
end
format %d srcdate

Dataset 2

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long srcdate str6 gvkey str10 cusip str58 conm str100 conml str4 sic double salecs
18627 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   50.174
18992 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   62.148
19358 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   70.183
19723 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"     71.5
18566 "166222" "90137E106" "20-20 TECHNOLOGIES INC"      "20-20 Technologies Inc"             "7372"   66.531
18931 "166222" "90137E106" "20-20 TECHNOLOGIES INC"      "20-20 Technologies Inc"             "7372"    68.25
20088 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
20453 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
20819 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
21184 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
21549 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  543.963
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  644.717
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  693.951
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  736.516
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090" 1018.182
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090" 1079.227
18627 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"   79.576
18992 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  162.209
19358 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  244.644
19723 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  324.879
20088 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  463.599
20453 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"   561.05
20819 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  524.525
21184 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  521.449
21549 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  494.492
18627 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  462.797
18992 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  453.605
19358 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  457.415
19723 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  455.522
20088 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  473.866
end
format %d srcdate