Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Nan_Deng
Beginner
104 Views

Does Visual Fortran Compiler support application running under Windows Server 2008 with more than 12 cores?

I tried to run an application (compiled with Intel Visual Fortran 11.1.051 on an HP Z800 with 12 cores, the code calls PARDISO from MKL) on a couple of servers (48 - 80 cores, 1TB RAM) running Windows Server 2008 R2 Enterprise, to see if I can get better performance using all the cores. Instead,the application stopped runningwith the error code "c0000005" (I googled it - seems to indicate access violation). My questions are:

- Does Intel Visual Fortran supportapplications running under Windows Server 2008 (Enterprise, Datacenter, etc.) with multiple cores?
- How many cores are the maximum that the compiled code can access without causing problem?
- Are there any compiling/linking switchesI need to turn on to make the code access multiple cores (more than 12)?
- Or maybe I need a new version/special edition of the compiler to recompile the code to make it work?
- Is this problem because of PARDISO which can access multiple cores? (Applications using only one core seem to run with no problem).


0 Kudos
10 Replies
TimP
Black Belt
104 Views

As you increase number of cores, stack usage is likely to increase, and you're likely to require X64 OS (Intel64 compiler) with adjustment in your /link /stack setting. /heap-arrays option may be an alternative.
According to my limited understanding, 2008 R2 X64 may limit you to 64 threads, but attempting to exceed that limit shouldn't abort your program.
Nan_Deng
Beginner
104 Views

I used the 64-bit version to compile initially since Z800 is running under XP/64. I didn't specify stack but used heap = 0 within the studio interface (according to a suggestion I got from here). This seems working for me up to the 12 cores machine.

My understanding of your post is IVF will work for more than 12 cores and also work for Win Server. Is this correct? Sorry if this question is too simple, does your mention of "64 threads" mean the program can access to 64 cores?What would be the best setup for stack and heap-array if I use command line compilation as well as use the studio interface?

Any suggestion is greatly appreciated.

Steven_L_Intel1
Employee
104 Views

The compiler has no fixed limit on number of cores. Intel MKL uses OpenMP internally to do operations on multiple cores and it may be that you have to set the environment variable KMP_STACK_SIZE to a larger value. I am moving your question to the MKL forum where MKL experts can help you.
TimP
Black Belt
104 Views

I think you may be running on a platform with 40 cores, 80 hyperthreads. When such platforms were first introduced, Windows 2008 R2 would not support more than 64 threads, so the recommended usage was to disable the hyperthreads in BIOS setup. By default, MKL would use only 1 thread per core anyway when it detects hyperthreading, in order to maximize performance.
The other possibility for using 80 threads was to set up 2 partitions for running applications separately.
If you had a platform with 64 cores, you would be able to use all of them.
I think "best settings" would depend on knowledge of your application. The compiler option /heap-arrays would cause the Fortran code to avoid using extra stack, which may be useful if you are depending on Pardiso for the parallelism.
KMP_STACKSIZE which Steve mentioned allows to increase the stack available to each thread. I think it's more likely that you exceed total stack size limit as you use more threads, since you didn't run into the thread stack size limit in the smaller problem.
Nan_Deng
Beginner
104 Views

Tim, Thanksfor your suggestions. The machine I am testing on is an HP DL980 with 8 CPUs, each with 10 cores, and a total 1TB RAM per HP Specs. So I think it has 80 cores physically, not 80 hyperthreads, but I'll check with the IS&T people.

RegardingKMP_STACK_SIZE. Does this oneneed to be specified inside the source code, set up during compiling process, or set up for the machine environment variable? Please provide some guidance.

I encountered this "access violation" problem testing the same (small) benchmark that has been run smoothly on a HP Z800 workstation. The same executable and same inputrun on the workstation (12 cores, 96 GB RAM) but not run on the server (80 cores, 1TB RAM). Do you think re-compile the code on the server and increasing stack size will solve the problem? It it will, what would be the recommended setup? And what would be the maximum allowable stack size?
104 Views

Hi,

Could you please try to reduce the number of threads used by MKL on HP DL980 server? Doing so we will be able to localize the problem (for example, the crash will occur starting from 64 threads or smth like this).

To set the number of threads used by MKL please call this routine prior to the very first call of PARDISO:

call mkl_set_num_threads(n)

where n may vary from 32 to 80 with some step.

Regards,
Konstantin
TimP
Black Belt
104 Views

I see that HP apparently tested the 80 core 160 thread configuration, so they may have a customized Windows version for that purpose.

KMP_STACKSIZE may be initialized by environment variable and modified by function call at run time prior to staring a parallel region, as discussed in the documentation installed with intel compilers. Default value for Intel64 is 4MB (per thread). So right there you would be using 320MB for 80 threads. If you don't need 4MB, a decreased value may reduce total stack usage.

As I suspect the Windows version may be customized for HP, I have no hope of guessing what its default and maximum overall stack size may be. You can set the stack size in the link command (e.g. in linker properties in VS) or by using a tool such as editbin to modify the .exe. It's not affected by compilation (where you make .obj files).
104 Views

Hi, I have run a PARDISO test with 80 thread on Windows Server 2008 R2 Enterprise (I should note that this machine has less that 80 cores, but I emulated your scenario running MKL with 80 OpenMP threads). Test passed well.

Could you please provide a reproducer of your issue?

Regards,
Konstantin
104 Views

Regarding KMP_STACK_SIZE - it should not be an issue as far as MKL rely on dynamic memory allocation rather than on stack.
jimdempseyatthecove
Black Belt
104 Views

Konstantin,

Emulation of 80 cores will not run into issues of logical processors bitmaps. The older revisions of Windows used a 32-bit or 64-bit bit map for logicial processor identification. Newer revisions of Windows still support this as legacy, but when doing so, it also supports groups of upto 64 logical processors. If code, (application, DLLor runtime library) does not use the extensions then there may be a restriction on the number of logical processors available to the application. The O/S may have partitioned the logical processors into groups of 40 and 40. (half of the 80).

The symptoms reported look more like a stack issue though.

Jim Dempsey
Reply