Suppose you have dataset containing many variables where you have performed some analyses using only a small number of the variables. Now it is time to upload the data and .do file to a replication archive (Dataverse etc.) for replication purposes. You don't want to upload the entire dataset, but only the relevant variables; specifically, you want to give users a dataset on which they will able to run your .do file containing your analyses, which possibly includes the creation of new variables--but no unnecessary variables.

Is there an efficient way to keep (or otherwise identify) only those variables in a dataset that are referred to in a given .do file (DVs, IVs, weights, in "if conditions", etc.)--but not those that are created within the .do file?

This is not so difficult to do manually, but it would be cool if there's a way automate it.