I tried to run an application (compiled with Intel Visual Fortran 11.1.051 on an HP Z800 with 12 cores, the code calls PARDISO from MKL) on a couple of servers (48 - 80 cores, 1TB RAM) running Windows Server 2008 R2 Enterprise, to see if I can get better performance using all the cores. Instead,the application stopped runningwith the error code "c0000005" (I googled it - seems to indicate access violation). My questions are:
- Does Intel Visual Fortran supportapplications running under Windows Server 2008 (Enterprise, Datacenter, etc.) with multiple cores?
- How many cores are the maximum that the compiled code can access without causing problem?
- Are there any compiling/linking switchesI need to turn on to make the code access multiple cores (more than 12)?
- Or maybe I need a new version/special edition of the compiler to recompile the code to make it work?
- Is this problem because of PARDISO which can access multiple cores? (Applications using only one core seem to run with no problem).
According to my limited understanding, 2008 R2 X64 may limit you to 64 threads, but attempting to exceed that limit shouldn't abort your program.
I used the 64-bit version to compile initially since Z800 is running under XP/64. I didn't specify stack but used heap = 0 within the studio interface (according to a suggestion I got from here). This seems working for me up to the 12 cores machine.
My understanding of your post is IVF will work for more than 12 cores and also work for Win Server. Is this correct? Sorry if this question is too simple, does your mention of "64 threads" mean the program can access to 64 cores?What would be the best setup for stack and heap-array if I use command line compilation as well as use the studio interface?
Any suggestion is greatly appreciated.
The other possibility for using 80 threads was to set up 2 partitions for running applications separately.
If you had a platform with 64 cores, you would be able to use all of them.
I think "best settings" would depend on knowledge of your application. The compiler option /heap-arrays would cause the Fortran code to avoid using extra stack, which may be useful if you are depending on Pardiso for the parallelism.
KMP_STACKSIZE which Steve mentioned allows to increase the stack available to each thread. I think it's more likely that you exceed total stack size limit as you use more threads, since you didn't run into the thread stack size limit in the smaller problem.
RegardingKMP_STACK_SIZE. Does this oneneed to be specified inside the source code, set up during compiling process, or set up for the machine environment variable? Please provide some guidance.
I encountered this "access violation" problem testing the same (small) benchmark that has been run smoothly on a HP Z800 workstation. The same executable and same inputrun on the workstation (12 cores, 96 GB RAM) but not run on the server (80 cores, 1TB RAM). Do you think re-compile the code on the server and increasing stack size will solve the problem? It it will, what would be the recommended setup? And what would be the maximum allowable stack size?
Could you please try to reduce the number of threads used by MKL on HP DL980 server? Doing so we will be able to localize the problem (for example, the crash will occur starting from 64 threads or smth like this).
To set the number of threads used by MKL please call this routine prior to the very first call of PARDISO:
where n may vary from 32 to 80 with some step.
KMP_STACKSIZE may be initialized by environment variable and modified by function call at run time prior to staring a parallel region, as discussed in the documentation installed with intel compilers. Default value for Intel64 is 4MB (per thread). So right there you would be using 320MB for 80 threads. If you don't need 4MB, a decreased value may reduce total stack usage.
As I suspect the Windows version may be customized for HP, I have no hope of guessing what its default and maximum overall stack size may be. You can set the stack size in the link command (e.g. in linker properties in VS) or by using a tool such as editbin to modify the .exe. It's not affected by compilation (where you make .obj files).
Could you please provide a reproducer of your issue?
Emulation of 80 cores will not run into issues of logical processors bitmaps. The older revisions of Windows used a 32-bit or 64-bit bit map for logicial processor identification. Newer revisions of Windows still support this as legacy, but when doing so, it also supports groups of upto 64 logical processors. If code, (application, DLLor runtime library) does not use the extensions then there may be a restriction on the number of logical processors available to the application. The O/S may have partitioned the logical processors into groups of 40 and 40. (half of the 80).
The symptoms reported look more like a stack issue though.