Hey all,

I wrote a couple Stata packages to do regression or classification with random forests and neural networks (specifically, multi-layer perceptrons) in Stata 16. These programs are basically wrappers for methods in the popular Python library scikit-learn. The packages will automatically load the required Stata variables into Python, use some scikit-learn methods on the data, and return predictions and other information to Stata's interface. This is essentially an expanded version of the example .ado file provided in Stata's release notes for the new Stata Function Interface.

I split these into two separate packages:
1. pyforest.ado - regression and classification with random forests
2. pymlp.ado - regression and classification with multi-layer perceptrons

The syntax for specifying optional arguments is nearly identical to the syntax used in scikit-learn. This means that the scikit-learn documentation is also a readable reference for using these packages. Of course, both of these packages also contain built-in Stata help files.

You can read a bit more about these packages and install them with instructions on GitHub:
https://github.com/mdroste/stata-pyforest
https://github.com/mdroste/stata-pymlp

I am still actively developing both of these packages, and I plan to submit them to SSC very soon. I am sure there are some bugs that will need to be fixed before then, since I put both of them together over the last two days or so. There's a whole bunch of stuff that I think should be added, but since both seem to be very much usable right now, I figured it's worth posting what I have for now.

If you have any issues with these packages, definitely let me know either on this thread or on Github.

I hope this is useful!

Mike