Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ramon_A_
Beginner
401 Views

MKL crashing when creating too many OpenMP threads

Hi, I have 64 threads running on a Intel Xeon Phi 7230. Each thread can run the following MKL rountine:
@constraint (ComputingUnits="${ComputingUnits}")
@task(returns=list)
def createBlock(BSIZE, MKLProc, diag):
    import os
    os.environ["KMP_AFFINITY"]="verbose"
    os.environ["MKL_NUM_THREADS"]=str(MKLProc)
    block = np.array(np.random.random((BSIZE, BSIZE)), dtype=np.double,copy=False)
    mb = np.matrix(block, dtype=np.double, copy=False)
    mb = mb + np.transpose(mb)
    if diag:
        mb = mb + 2*BSIZE*np.eye(BSIZE)
    return mb
MKL_NUM_THREADS is set to 64 in order to take advantage of all the cores. When executing the routine number 32, I obtain the following error:
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.

I've found here https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime-library/topic/622016 that threads are not destroyed so I can be reaching the thread limit in the machine. The thing is that, at each time, only one thread is running so only 64 OpenMP threads are awaken. My problem is that I'm running this code in a shared cluster so I should not recompile the library with my custom setting if possible. Is there a way to avoid this problem without decrasing the amount of threads running on the machine? I think that just having a fewer amount of threads i could avoid this problem but this is a part of a bigger program and I am really interested in keeping the 64 threads.

Regards,

Ramon

 

0 Kudos
8 Replies
SergeyKostrov
Valued Contributor II
401 Views

>>OMP: Error #34: System unable to allocate necessary resources for OMP thread: >>OMP: System error #11: Resource temporarily unavailable >>OMP: Hint: Try decreasing the value of OMP_NUM_THREADS. You need to increase: - A stack size for OpenMP threads ( a default value is exceeded in your case ), and - A default stack size of your application ( Note: On a KNL server with a Linux Red Hat that I use 'ulimit -s' shows 512K ).
Ramon_A_
Beginner
401 Views

Hi Sergey,

Thanks for the fast response.

I've tryied what you suggested and, for the "ulimit -s" call, I get "unlimited". On the other hand, I've set "KMP_STACKSIZE" to "1000m". I get the same error at the same point.

I forgot to specify that I'm using MKL through Numpy with Intel Python 2.7.11. As shown in the example code, all the variables are set before entering the first parallel region. Nevertheless, the module is imported before. Could this be a problem? Thanks in advance.

Ramon

TimP
Black Belt
401 Views

If resource exhaustion occurs with increasing number of threads, decreasing omp_stacksize seems a more likely tactic.  Assuming it was working at 4m with a reasonable number of threads, changes by more than a factor of 2 seem ridiculous.

SergeyKostrov
Valued Contributor II
401 Views

>>...all the variables are set before entering the first parallel region. Nevertheless, the module is imported before. >>Could this be a problem?.. In case of C/C++ languages I use kmp_set_defaults function, like: ... kmp_set_defaults( "KMP_AFFINITY=compact" ); ... to set an environment variable(s), or set all environment variable(s) before starting your application.
SergeyKostrov
Valued Contributor II
401 Views

>>...np.array(np.random.random(( BSIZE, BSIZE ))... What is a default value for BSIZE?
SergeyKostrov
Valued Contributor II
401 Views

>>...Assuming it was working at 4m with a reasonable number of threads... It is clear that application crashed with a default value for OMP stack size.
Ramon_A_
Beginner
401 Views

Hi to both and thanks for the responses,

>> If resource exhaustion occurs with increasing number of threads, decreasing omp_stacksize seems a more likely tactic.

I don't really understand why should I decrease the stack size. Nevertheless, I assigned "1m" to "KMP_STACKSIZE". The program crashed exactly at the same point.

>> In case of C/C++ languages I use kmp_set_defaults function, like: ... to set an environment variable(s), or set all environment variable(s) before starting your application.

So, I assume that is not possible to change KMP_NUM_THREADS dinamically depending on the call that is done at each moment? I thought that the amount of OpenMP threads was defined by the environmental variable at the beggining of the parallel version.

>> What is a default value for BSIZE?

BSIZE is equal to 4096 in this execution. The block created has 134MB. Is this important?

>> It is clear that application crashed with a default value for OMP stack size.

For the moment, I tried to not change the value, set it to "1000m" and "1m". The thing is that the program crash always at the same point, so I tend to think that is not directly caused by this.

 

SergeyKostrov
Valued Contributor II
401 Views

>>So, I assume that is not possible to change KMP_NUM_THREADS dinamically depending on the call that is done at each moment? >>I thought that the amount of OpenMP threads was defined by the environmental variable at the beggining of the parallel version. mkl_set_num_threads( num_of_threads ) allows to change number of OpenMP threads for processing. >>>> What is a default value for BSIZE? >> >>BSIZE is equal to 4096 in this execution. The block created has 134MB. Is this important? Yes and stack size should be greater than 134MB. >>>> It is clear that application crashed with a default value for OMP stack size. >> >>For the moment, I tried to not change the value, set it to "1000m" and "1m". The thing is that the program crash always >>at the same point, so I tend to think that is not directly caused by this. Do a set of tests: ... mkl_set_num_threads( 1 ) Set OpenMP stack size to 64MB ... mkl_set_num_threads( 1 ) Set OpenMP stack size to 128MB ... mkl_set_num_threads( 1 ) Set OpenMP stack size to 192MB ... mkl_set_num_threads( 1 ) Set OpenMP stack size to 256MB ... to find out what stack size is needed for a single OpenMP threaded processing.
Reply