Hi,
I am writing this post to ask whether there is any Stata command (or doing manually) to apply the random forest (or other machine learning algorithm) to time-series data.
I have a household-level panel survey data; collected from 2,800 households for 10 years (thus 28,000 observations) and has 130 variables.
My goal is to select the features from 100 right-hand side variables using machine learning to find the model that best predicts the continuous outcome variable.
The problem is that, machine learning requires training data and test data to be independent, which is obviously violated in serially correlated time-series data.
I had no problem running lasso; I used "cvlasso", a user-written package that allows users to run LASSO with time-series data by properly cross-validating it.
However, I have not find any Stata command which runs random forest with time-series data. I checked "rforest" and "chaidforest", but neither seems to deal with auto-correlated time series data.
Is there any Stata command, or manual way to run random forest with time series data?
Or more in general, is there any machine learning algorithm other than lasso and random frorest, but works well with time series data? I just need one more machine learning algorithm to run so I can compare it with lasso.
Thank you.
System: Windows 10
Stata Version: 16.1 MP
0 Response to How to Run Random Forest for Time-Series Data
Post a Comment