Recently I was trying to running a program with a large array, to be more specific a unidimensional array of 346840131876 elements. This array is an integer 4 allocatable variable. Besides the large number of elements I just use 0,01% of them (5458868 elements will be used), for this reason, I believed that the program will run. But when the array is allocated the program stops with the message of Insufficient virtual memory. I was running the program on a 64-bit OS with 64GB of RAM and 130GB swap.
My question is, I really need more virtual memory to run this program? or there is another thing to try?
If you ask to allocate an integer array of 346840131876 elements, then that's exactly how much memory your program will try to allocate (346840131876 times the size of integer). If your array is quite sparse, then you could consider a different way to represent it. For example, you could represent it as a linked list, where each element is <index, value, next>, where "index" is an index of the element if it were a real array, "value" is the value you are trying to store, and "next" is the pointer to the next element. If "index" is 64 a bit value, "value" is 32 bit, and "next" is 64 bit, your total element storage requirement is 20 bytes. Total memory requirement would be 20 * 5458868, which is about 110 MB (plus the memory management overhead). Of course, accessing each element would be more difficult. This is just an example, you should do what makes more sense given your requirements.
Another possibility (outside of pure Fortran support) would be to reserve the virtual memory, but not commit it. Then only commit the memory pages that are required (the memory pages that actually hold any elements). This solution requires calling various system calls, and it may be easier to write such memory manager in C and expose certain functionality to Fortran as C functions.
346,840,131,876 * 4 = 1,387,360,527,504 = 1,292 GB
With 0.01% occupancy.
I would suggest using an array of doublets. An integer(8) to hold index, and an integer(4) (or 8) to hold the value. These could be two arrays, one (8)) holding index, the other (4) holding value.
Thank's for the responses, but I think it's wont work. I will try to explain why.
In my problem, each index of my vector represents a configuration of particles in my system. I storage the configuration in a bit representation, for example, the number 346840131876 represents a configuration with 13 particles (1=particle 0=null). The value of an index is one of the 5458868 possible configurations. Then the huge vector it's a fast way (that I found) to point out which configuration I am working on. For example, the value of the index 4681 it's the number 8 configuration of my system. I could use a vector with 5458868 elements, but doing it I will need to search in the vector for the configuration, which leaves very slow my program.
In other words, the huge vector is a pointer to another vector, it identifies which configuration I am working and specify where I will storage they information.
How are the indices of 5458868 configurations distributed among the various possible index values? Perhaps you could come up with a good hashing function, and store your configurations in 5458868 buckets of a hash table? Any collisions could be handled by making each bucket point to a linked list (or an array) of configuration values. If the distribution of pretty uniform, each bucket would have only a few linked list entries.
Well, you have created a 12-digit key for data that will span only 5.5 million possible values (your possible "configurations") of the key. Create an index (or indices), 7-digits in size, that you can use to map (i) the key to the index into the array and, (ii) back from the key value to array index..
#4>>each index of my vector represents a configuration of particles in my system
#1> Besides the large number of elements I just use 0,01% of them (5458868 elements will be used)
Those are contradictory statements.
Mecej's suggestion of hashing the 12 digit key into about 5.8 million values (5.5 million is about 85% of 5.8) will yield a hash table with very few collisions. You would have to craft the hashing function based on the knowledge of the 12 digit key.
(or buy a system with 2TB of RAM)
Thanks for the responses. I will try to craft a hashing table to handle with the problem. I never work with it, but it seems to be a good solution