Hi

I need help urgently colleagues with the following

I have a large longitudinal panel dataset spanning 8 years and I am doing analysis on the dataset for my dissertation. What i intend to do however, is to generate a random sample of the Data, which i have done with the following code

use national_incomes_wave1_dataset, clear
by region: sample 20
saveold national_incomes_wave1_sampledata

This means that this initial sample is what I will make use of as my Train Dataset, intending to use the remaining observations (80%) as the Test Dataset, to see if the code i am going to create works consistently for the rest of the data.

My challenge is that I do not know how to separate the two or partition the data into train (20%) and test (80%) and save both whilst retaining the two Datasets to contain only mutually exclusive observations--that is ensure that observations in Train data (20%) does not contain any observations in the Test Data (80%).

All i have managed to do so far is cut out a sample without replacement,..with the rest of the data being deleted,...is there a way i can achieve what i have stated above.