Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Why my fortran jobs got killed?

Gators_vs__Sundevil
1,181 Views

Hi,

I have encountered a very strange problem while running my fortran code on a ubuntu system. I was using the PARDISO to solve a very 'huge' system using the OOC mode. For a first case, according to iparm(17), it required 680G harddisk storage. And the problem was solved without any issue. For the second case, the matrix size is even bigger, almost twice of the previous one, but the harddisk usage is not increased much, around 700G according to iparm(17). And this is the only difference between these two problems. But the job of the second problem was killed after phase 22. I have used phase 11, 22 and 33 in my code. For both cases, I have set the ulimit to unlimited and the KMP_STACKSIZE to 5G. So, what should be the problem? Why the second job was killed by the system. Any suggestion will be much appreciated.

By the way, the machine I was using has 250G RAM, but for large problems I still needs to use the OOC mode.

0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
1,181 Views
0 Kudos
Gators_vs__Sundevil
1,181 Views

jimdempseyatthecove wrote:

oom may have killed your job:

https://www.kernel.org/doc/gorman/html/understand/understand016.html

Jim Dempsey

 

Hi, Jim,

Thanks for your information. The total available harddisk storage is 2T and the RAM is 250G. As mentioned, I was using the OOC mode. Thus, there should not consume much RAM in my cases. All the required space to store the LU results is on the harddisk.

Anyway, if it was oom killed the process, how can i avoid this happen again, i.e., is there any way to set oom not to kill the process?  Much appreciated.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,181 Views

For questions like this, Google is your friend. Search for "disable oom".

Possibly: http://thetechnick.blogspot.com/2010/12/steps-to-disable-oom-on-linux.html

That is for system wide disable. You might want to do some googling on your own to see how you can specifically do this for a given application. Note, the references I found, indicate you can do this for a specific process ID as opposed to via path to process. So your app would have to get its PID and then write the appropriate flag value.

Jim Dempsey

0 Kudos
Gators_vs__Sundevil
1,181 Views

jimdempseyatthecove wrote:

For questions like this, Google is your friend. Search for "disable oom".

Possibly: http://thetechnick.blogspot.com/2010/12/steps-to-disable-oom-on-linux.html

That is for system wide disable. You might want to do some googling on your own to see how you can specifically do this for a given application. Note, the references I found, indicate you can do this for a specific process ID as opposed to via path to process. So your app would have to get its PID and then write the appropriate flag value.

Jim Dempsey

Thanks, Jim. I found out a way how to solve the problem. I reduced the max amount of RAM can be used by OOC in the config file (MKL_PARDISO_OOC_MAX_CORE_SIZE, MKL_PARDISO_OOC_MAX_SWAP_SIZE). Previously, these two numbers were almost the same as the available RAM for the system. I think this may be the reason why the OOM decided to kill the job.

0 Kudos
Reply