Hello,
I am new to posting on the forum, but I follow it frequently. Thank you to the various contributors, It is very helpful.
I have a regression that I need to run 10 (I need to run it 1000 times, but the trial run is for 10 times) times and save the output using regsave. In each iteration, there is one variable (id_random) that needs to be randomly generated. It is one of the independent variables in the regression, which has a large number of fixed effects. Therefore, I am trying to use batch mode. My dataset is an unbalanced panel dataset with firms across years.
I am having two issues
1) all 10 iterations seem to start from the same seed. I could define the seed every time as set seed `=123456+`SLURM_ARRAY_TASK_ID'' but I am having difficulty capturing the SLURM_ARRAY_TASK_ID in my do file.
2) the output is not getting saved with the task id suffix. I want each iteration to have a stata output file _FEresults_1 _FEresults_2 and so on up to _FEresults_10 based on the regsave command in my do file
regsave using ./_FEresults_SLURM_ARRAY_TASK_ID,
again the issue seems to be that I am having difficulty capturing the SLURM_ARRAY_TASK_ID in my do file
I submit the following script for batch mode.
-------------------------------------------------------------------
#!/bin/bash
#SBATCH -J state # Job name
#SBATCH -o stata_%A-%a.out # Job output file name
#SBATCH --array=1-10 # Replace with your range
#SBATCH -p standard-mem-s # Job queue
#SBATCH -c 6 # Cores
#SBATCH --mem-per-cpu=6G # Memory
module purge
module load stata/mp-15
stata-mp do randomization.do ${SLURM_CPUS_PER_TASK} ${SLURM_ARRAY_TASK_ID}
The randomization.do file contains the following:
-----------------------------------------------------
ssc install regsave
***my input file
use ./_Main, clear
***drop the variable from prior iteration and generate a new random number
drop id_random
gen id_random = runiform(0 , 1200)
***regression
sort firm year
tsset firm year, annual
reg y size i.firm i.year id_random, vce(cluster firm)
regsave using ./_FEresults_SLURM_ARRAY_TASK_ID, replace ci level(95) detail(scalars)
My output is as follows:
-------------------------------
10 log files named stata_SLURM_CPUS_PER_TASK_SLURM_ARRAY_TASK_ID
single stata output file _FEresults_SLURM_ARRAY_TASK_ID instead of _FEresults_1 _FEresults_2 and so on up to _FEresults_10
How do I get the do file to recognize and use the array task id?
Thank you!
Gauri
0 Response to Using stata in batch mode - issue with random numbers and saving output with task array id
Post a Comment