- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have encountered a very strange problem while running my fortran code on a ubuntu system. I was using the PARDISO to solve a very 'huge' system using the OOC mode. For a first case, according to iparm(17), it required 680G harddisk storage. And the problem was solved without any issue. For the second case, the matrix size is even bigger, almost twice of the previous one, but the harddisk usage is not increased much, around 700G according to iparm(17). And this is the only difference between these two problems. But the job of the second problem was killed after phase 22. I have used phase 11, 22 and 33 in my code. For both cases, I have set the ulimit to unlimited and the KMP_STACKSIZE to 5G. So, what should be the problem? Why the second job was killed by the system. Any suggestion will be much appreciated.
By the way, the machine I was using has 250G RAM, but for large problems I still needs to use the OOC mode.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
oom may have killed your job:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
oom may have killed your job:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
Jim Dempsey
Hi, Jim,
Thanks for your information. The total available harddisk storage is 2T and the RAM is 250G. As mentioned, I was using the OOC mode. Thus, there should not consume much RAM in my cases. All the required space to store the LU results is on the harddisk.
Anyway, if it was oom killed the process, how can i avoid this happen again, i.e., is there any way to set oom not to kill the process? Much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For questions like this, Google is your friend. Search for "disable oom".
Possibly: http://thetechnick.blogspot.com/2010/12/steps-to-disable-oom-on-linux.html
That is for system wide disable. You might want to do some googling on your own to see how you can specifically do this for a given application. Note, the references I found, indicate you can do this for a specific process ID as opposed to via path to process. So your app would have to get its PID and then write the appropriate flag value.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
For questions like this, Google is your friend. Search for "disable oom".
Possibly: http://thetechnick.blogspot.com/2010/12/steps-to-disable-oom-on-linux.html
That is for system wide disable. You might want to do some googling on your own to see how you can specifically do this for a given application. Note, the references I found, indicate you can do this for a specific process ID as opposed to via path to process. So your app would have to get its PID and then write the appropriate flag value.
Jim Dempsey
Thanks, Jim. I found out a way how to solve the problem. I reduced the max amount of RAM can be used by OOC in the config file (MKL_PARDISO_OOC_MAX_CORE_SIZE, MKL_PARDISO_OOC_MAX_SWAP_SIZE). Previously, these two numbers were almost the same as the available RAM for the system. I think this may be the reason why the OOM decided to kill the job.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page