Hi
I need help urgently colleagues with the following
I have a large longitudinal panel dataset spanning 8 years and I am doing analysis on the dataset for my dissertation. What i intend to do however, is to generate a random sample of the Data, which i have done with the following code
use national_incomes_wave1_dataset, clear
by region: sample 20
saveold national_incomes_wave1_sampledata
This means that this initial sample is what I will make use of as my Train Dataset, intending to use the remaining observations (80%) as the Test Dataset, to see if the code i am going to create works consistently for the rest of the data.
My challenge is that I do not know how to separate the two or partition the data into train (20%) and test (80%) and save both whilst retaining the two Datasets to contain only mutually exclusive observations--that is ensure that observations in Train data (20%) does not contain any observations in the Test Data (80%).
All i have managed to do so far is cut out a sample without replacement,..with the rest of the data being deleted,...is there a way i can achieve what i have stated above.
Related Posts with Data Management
Reshape a db in matrix formHi all, I have the following database Code: * Example generated by -dataex-. To install: ssc inst…
Reshape commandHello everybody I'd like to reshape my data from wide to long format but as the name of my variable…
Predicted probabilities out of range after HeckmanDear all, I would like to know your opinion on the following issue I am encountering: I am running …
mvprobit or cmp?Dear STATA Users, I am trying to estimate a multivariate probit for a given outcome that would mode…
Question about AR test of xtabondDear all, I have a question about xtabond.We can use --estat abond--to do the AR test of the error …
Subscribe to:
Post Comments (Atom)
0 Response to Data Management
Post a Comment