hello,

im trying to split a huge data set into about 400 files so that i can run an analysis on them with stata using collapse and reshape. the larger the number of files i can split into, the faster it will run.

since i have panel data that consists of multiple entries for each patient_id, i have to make sure that identical id's will stay together. so i am trying to form groups of Id's.

my data looks like this:
patient_id x y z
1
1
1
2
3
3
3
4
4

I d like to group the patient_id's like this:
patient_id group x y z
1 1
1 1
1 1
2 1
3 2
3 2
3 2
4 2
4 2
i m looking for a way to automate these commands to group the patient_ids:
gen group=1 if patient_id<=2
replace group=2 if patient_id >2 & patient_id<=4
replace group=3 ....... and so on for 400 different groups.


i need to make sure that patient_id's are not split into different groups (ie patient_id=1 is not split at 2nd observation and thus results in patient 1 in group 1 and 2)

any feedback or alternative methods would be much appreciated.

thx
vishal