Hello everyone!
I'm a very beginner using STATA so please I'd like to get some help with merge command,
My purpose is to add variables to the master dataset and keep only what is matched related to a particular observation, so I've been told I must use merge command (I'm using merge in STATA 17). My master dataset has more observations than my using dataset, but they both have in common two variables householdnumber and household_number_member. I was suggested to use all common variables when merging, not just one in common.
However, in the master dataset the householdnumber variable has repeated observations. In other words, appears multiple times the same household number referring to each member of the household, who will be identified by household_number_member variable. Thus, household_number_member is identifed as 1= marriage immigrant, 2= national spouse, 3= children, etc. On the other hand, the using dataset doesn't have repeated observations in the variable householdnumber, because this dataset only uses in the household_number_member variable the observation 1 (which means, 1= marriage immigrant) that's why it has less observations.
By using merge, I'm only interested in keeping all related to 1 (which means 1=marriage immigrant) and adding variables that appear in the using dataset.
To make myself clear, and to get proper help. I'm using dataex command as follows (the following code corresponds to my master dataset)
* Example generated by -dataex-. For more info, type help dataex
clear
input long householdnumber byte household_number_member
10000028 1
10000028 2
10000031 1
10000031 2
10000062 1
10000068 1
10000068 2
10000124 1
10000124 2
10000140 1
10000140 2
10000186 1
10000187 1
10000187 2
10000213 1
10000232 1
10000232 3
10000270 1
10000270 2
10000292 1
10000293 1
10000293 2
10000315 1
10000315 2
10000328 1
10000328 2
10000344 1
10000393 1
10000404 1
10000404 2
10000415 1
10000415 2
10000419 1
10000419 2
10000422 1
10000422 2
10000455 1
10000455 2
10000492 1
10000492 2
10000497 1
10000497 2
10000533 1
10000533 2
10000549 1
10000556 1
10000556 2
10000556 3
10000556 4
10000556 21
10000592 1
10000592 2
10000592 3
10000592 21
10000601 1
10000601 21
10000608 1
10000608 2
10000615 1
10000658 1
10000658 2
10000700 1
10000700 2
10000700 3
10000712 1
10000712 2
10000712 3
10000713 1
10000713 2
10000764 1
10000771 1
10000771 2
10000774 1
10000774 2
10000787 1
10000855 1
10000855 2
10000894 1
10000894 2
10000894 3
10000902 1
10000902 2
10000952 1
10000952 2
10000952 21
10000970 1
10000970 2
10001003 1
10001003 2
10001004 1
10001004 2
10001069 1
10001069 2
10001142 1
10001193 1
10001193 2
10001193 3
10001193 4
10001200 1
10001200 2
end
The following code corresponds to my using dataset:
* Example generated by -dataex-. For more info, type help dataex
clear
input long householdnumber byte household_number_member
10000028 1
10000031 1
10000062 1
10000068 1
10000124 1
10000140 1
10000186 1
10000187 1
10000213 1
10000232 1
10000270 1
10000292 1
10000293 1
10000315 1
10000328 1
10000344 1
10000393 1
10000404 1
10000415 1
10000419 1
10000422 1
10000455 1
10000492 1
10000497 1
10000533 1
10000549 1
10000556 1
10000592 1
10000601 1
10000608 1
10000658 1
10000700 1
10000712 1
10000713 1
10000764 1
10000771 1
10000774 1
10000787 1
10000855 1
10000894 1
10000902 1
10000952 1
10000970 1
10001003 1
10001004 1
10001069 1
10001142 1
10001193 1
10001200 1
10001214 1
10001292 1
10001332 1
10001334 1
10001353 1
10001425 1
10001426 1
10001434 1
10001445 1
10001482 1
10001523 1
10001609 1
10001690 1
10001720 1
10001775 1
10001782 1
10001827 1
10001865 1
10001883 1
10001924 1
10001959 1
10001969 1
10002006 1
10002084 1
10002150 1
10002174 1
10002193 1
10002227 1
10002311 1
10002325 1
10002403 1
10002409 1
10002437 1
10002498 1
10002511 1
10002568 1
10002580 1
10002592 1
10002595 1
10002600 1
10002620 1
10002714 1
10002730 1
10002732 1
10002749 1
10002816 1
10002840 1
10002870 1
10002941 1
10002970 1
10003085 1
end
I've been using the command as follows, but STATA issued an error message saying: factor-variable and time-series operators not allowed
merge 1:1 householdnumber household_number_member "filename", keep if household_member_number==1
Any help would be very much appreciated.
Related Posts with How to merge two datasets and How to use keep
Sum by ID in one data fileHello everyone, I have dataset like this, ID value1 value2 value3 .... 1 10 20 30 1 20 20 30 1 40 …
Calculating duration of follow-up timeHi everyone, I am analyzing mortality data from the National Health Interview Survey. The public us…
Change ID codes to a corresponding different ID codes in a panel datasetI have a dataset where the industries are identified with the NAIC codes. I want to change the NAIC …
A little help with summary statisticsHey everyone, Firstly happy Christmas to those who celebrate it and happy holidays too! Back to th…
Calculating person yearsHi everyone, I am analyzing mortality data from the National Health Interview Survey. The public us…
Subscribe to:
Post Comments (Atom)
0 Response to How to merge two datasets and How to use keep
Post a Comment