I start to work with Intel Compiler and the performance are much better than with VC++ compiler (Visual Studio 2008). However, I have a problem, due to the use of omp code for parallelization: - For the compilation, some function are not recognized (omp_get_num_proc for instance) - If the compilation is fine, when I execute the code, an exception is launched (stack overflow, chkstk.asm). The code is running fine with VisualC++. In my code, I have many arrays defined on the stacks, but I haven t any problem with the VC++ compiler. How come?
I tried to increase the omp stack size (in Project->Properties), but it doesn t change anything. I m now downloading the updates, hoping that it may fix the problem.
ICL option /Qopenmp (it's one of the options in VS project properties for Intel C++) is equivalent to MSVC option /openmp. When used to drive the link, it includes libiomp5 in the linker dependencies, where those OpenMP functions (and all the MS openmp library functions, as well as the Intel ones) are implemented. Then you must take care that vcomp isn't linked (which evidently it isn't in your case). It's entirely common that you must boost the /link /stack: allocation when using OpenMP (in linker properties, or command line), and entirely possible that ICL uses more stack than MSVC. The linker stack size properties aren't specific to OpenMP. Note that you could set one of the stack sizes in the linker properties and miss the one which controls the show. There is also a per-thread stack size, controlled by KMP_STACKSIZE environment variable or function call (defaults 2MB 32-bit, 4MB 64-bit).
The problem is still not fixed. I
have just increased the KMP_STACKSIZE (till the limit of my computer). I
have checked the size change by invoking kmp_get_stacksize_s(). The
As it is a memory issue, I have checked eventual problems with Intel Inspector, but nothing particular appeared. To make sure vcomp isn t linked, I put it as ignored library in the Project->Properties.
If you have any idea of the problem, please don t hesitate to share it. Is there some compilation options, or particular project properties that must be set when using Intel compiler?
If you make KMP_STACKSIZE very large, I suppose you will run out of overall stack, at least when running more than 1 thread. It's unusual to use more than 10MB successfully for KMP_STACKSIZE, even in cases which use private arrays, when you would need to increase the main stack accordingly (surely > 100MB, even for a very small number of threads). Without private arrays, one would expect the default to be OK for thread stack. Needless to say, in my view, it's difficult to give simple advice.
I don't think you would want to make KMP_STACKSIZE very large; I've seen it work only up to 10MB, and then of course the overall stack limit has to be increased proportional to thread stack size times number of threads. I doubt you would need more than default thread stack size unless you use large private data structures, which tend to hinder performance anyway.
The problem is that I use only one thread in the execution, and it fails for each kmp stack size. It is very strange and I don't know if it really comes from stack size setting or it is another memory management problem in my code, or perhaps I didn t use Intel compiler well (compilation options...)
I haven't verified this, the following is an assumption
KMP_STACKSIZE (perhaps) is only affecting the stack size of the auxiliary threads created for your parallel regions. IOW the main threadmay not be affected by this environment variable. In your start-up project Properties | Linker | System | Stack Reserve Size you have a property that specifies the application main thread stack size. Try setting that to your desired size as well as setting KMP_STACKSIZE. Note, the main thread may need a larger stack size than the auxiliary threads because it may have stack consumption prior to establishing the parallel regions.
Thanks a lot for your reply. The easiest solution for me was to replace some fixed size arrays defined on the stack by dynamic array (new/delete). I chose to replace the big array and leave the small arrays on the stack. Now, it is working, even if the performance is slighlty worst.
When you have a multi-threaded application (you are using OpenMP) and when each may concurrently call a function that requires a large temporary array (where you are now using new/delete), then consider using thread local storage and placing an allocatable array in there Note, if the run time of the subroutine is long then the allocate/deallocate time may be too small to worry about (memory fragmentation may still be an issue which can be resolved by the allocate once per thread technique).