Hi
I need help urgently colleagues with the following
I have a large longitudinal panel dataset spanning 8 years and I am doing analysis on the dataset for my dissertation. What i intend to do however, is to generate a random sample of the Data, which i have done with the following code
use national_incomes_wave1_dataset, clear
by region: sample 20
saveold national_incomes_wave1_sampledata
This means that this initial sample is what I will make use of as my Train Dataset, intending to use the remaining observations (80%) as the Test Dataset, to see if the code i am going to create works consistently for the rest of the data.
My challenge is that I do not know how to separate the two or partition the data into train (20%) and test (80%) and save both whilst retaining the two Datasets to contain only mutually exclusive observations--that is ensure that observations in Train data (20%) does not contain any observations in the Test Data (80%).
All i have managed to do so far is cut out a sample without replacement,..with the rest of the data being deleted,...is there a way i can achieve what i have stated above.
Related Posts with Data Management
How to mix the obs of two variables?Hello everyone, This is my database. Each org_uuid is matched with a specific investor_uuid. This a…
Defining duplicates through extracting day from date-time variableHi all, I have scoured statalist for some time but have not found the answer to this question. Fir…
Find the the variable with highest value and mark this as 1, while the other variables are marked 0Hello all, I have a data set that looks like this: Code: input int year float(A B C D E) 1950 .3…
Generate new variable(s) conditional on different variables taking a specific valueDears, I would generate new variables every time that corr_tof* takes the same value as tof. Here b…
Intended Nickell bias to test the usefullness of different fixed effectsHi all, I am working on panel data and I assume that at least one of my explanatory variables is en…
Subscribe to:
Post Comments (Atom)
0 Response to Data Management
Post a Comment