Hi
I need help urgently colleagues with the following
I have a large longitudinal panel dataset spanning 8 years and I am doing analysis on the dataset for my dissertation. What i intend to do however, is to generate a random sample of the Data, which i have done with the following code
use national_incomes_wave1_dataset, clear
by region: sample 20
saveold national_incomes_wave1_sampledata
This means that this initial sample is what I will make use of as my Train Dataset, intending to use the remaining observations (80%) as the Test Dataset, to see if the code i am going to create works consistently for the rest of the data.
My challenge is that I do not know how to separate the two or partition the data into train (20%) and test (80%) and save both whilst retaining the two Datasets to contain only mutually exclusive observations--that is ensure that observations in Train data (20%) does not contain any observations in the Test Data (80%).
All i have managed to do so far is cut out a sample without replacement,..with the rest of the data being deleted,...is there a way i can achieve what i have stated above.
Related Posts with Data Management
cond() functionHi all! I have the following dilemma. I would like to create variables such that: if a local takes…
A loop to predict a log-modelFor the model, Y= Yt-1+X1t+X2t+X3t+e I want to create a loop to predict future values such that the …
Simple by() and 'if' questionI'm working with some historical data on NFL players. I want to create a dummy variable indicating w…
Change of variable in one columnHi Stata users, I am trying to generate a variable to tell me when a change in values has occurred …
Need for a permutation test ?Hi, STATA people, I have 12000 observation. I want to randomly assign a fake treatment value of 1 t…
Subscribe to:
Post Comments (Atom)
0 Response to Data Management
Post a Comment