How much will more RAM help me with processing speed.
I am working with a dataset of 87 million records and is about 3gb in size. Its a 20 year cohort of people and my datasets contain 5-6 rows of data corresponding to their health events. It would be difficut to collapse any further (I could of course separate them into years, but would rather improve processing power than add these extra steps.
For example, today i tried to run a bsample to sample 800000 records out of my 80,000,000, and it took 1 hours and never did complete. I have similar long wait time when trying to flag if a string contained any of a list of 20 ICD codes from this database, waiting times of upwards of an hour while the computer sounds like its about to take off like an airplaine
I have 7GB of available ram and it always running at max during these churns I currently only have 50GM of free C drive disk space (its mostly full). Lots of this I cant delete as they are records and dataset from other projects, and since this is a work computer getting more space will require a series of request from IT etc..
What will make the most difference for me? Requesting more hard drive space would be very easy, and I seem to remember that stata needs plenty of hard drive space to run big datasets.
Or do I need to double my RAM and that is the only way.
This is an ongoing project and will be working with these datasets for at least a year, so need to fix. Using stata 17SE
Thanks
0 Response to Best way to improve processing speed for large data sets (~3gb)
Post a Comment