Hi,
Can somebody please advise me on the best way to merge information I have on individuals and their domiciles? For instance, my analysis will be about the "reference person" in each domicile, and there might be many people living in each domicile. When the data was collected, there were 2 different questionnaires, one about the domicile (location, aggregate income and expenditure, etc.) and the other about the individuals (personal and income and expenditure, height, age, etc.). The original dataset I have access to has it all together, all the information about both domiciles and individuals (approximately 2,5GB and 5 million observations). I do not have access to a unique id, but rather 5 or 6 variables that, together, identify the person/domicile.
Any advice on how I can prepare the data to get rid of the excess data and keep only the information on the reference person of each domicile as well as all the information about the domicile itself?
Thanks a lot in advance. Any suggestions will be greatly appreciated.