Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29403 Discussions

Compiling on one machine for another with more processors

asklingler
Beginner
903 Views
I'm compiling a Monte Carlo simulation for deployment on other machines. I use the default parallelization settings. On my own (2-processor) machine I get a significant speedup from parallelization and Task Manager shows close to 100% CPU utilization. If I plan to run the executable one of the other machines (six-core) how can I get a similar speedup? I have changed environment variables to NUMBER_OF_PROCESSORS=6 and OMP_NUM_THREADS=6. Not only do I not get any speedup, the CPU utilization level hangs around 17%, though it's distributed over two processors.
Any clues here? I'm hoping I don't have to become an OpenMP expert to get some benefit out of the additional cores.
Thanks.
0 Kudos
4 Replies
Steven_L_Intel1
Employee
903 Views
The runtime automatically picks the number of "execution units" (CPUs*cores*threads) as the number of threads to create. It may be that your application doesn't scale past two threads. Note that the "17%" is total over the CPUs, so if, say, there are 12 threads possible (6 cores with HyperThreading), then two threads would be about 17%.
0 Kudos
Martyn_C_Intel
Employee
903 Views
Normally, that should work on the other system. You don't even have to set the number of threads, that should be taken from the number of (logical) cores/processors. I would definitely not set the number of processors, leave that to the OS. Does it look in Task Manager as if the app is running on only one core at a time?

What processor type are your 6 cores? Does it have hyperthreading enabled? You might try setting KMP_AFFINITY=physical, though it doesn't reallysound like the problem here.

Monte Carlo apps typically make calls tolibrary random number routines, which may be serialized for thread safety. Does your app spend a lot of time in random number calls?

By default parallelizxation settings, do you mean "Yes (/Qparallel)"? Please could you send us your complete command line? And for completeness, the versions of Visual Studio, Windows and the Intel Compiler.
0 Kudos
asklingler
Beginner
903 Views

Six core intel xeon X5670. Just judging by the performance tab in task manager, I have hyperthreading enabled on *neither* my original (2-core) machine nor the 6-core xeon.
The app does spend a fair amount of time in random number calls, so that *could* be the issue, but....
My confusion is this: again, judging by what I see in Task Manager, the two-core machine uses both cores fully; the same executable on the six core machine uses only the equivalent of one core (that's the 17% utilization number). Not only am I not getting additional parallelization, I could be getting less.
I do mean Yes(/Qparallel).
Thanks both of you for the help. Sorry it took so long to get back.
In case it matters, here's the command line:
/nologo /Qparallel /Qip /fpp /I"C:\Program Files\Intel\Compiler\11.1\065\include" /I"C:\Program Files\Intel\Compiler\11.1\065\include\ia32" /I"c:\Program Files\Microsoft Visual Studio 8\VC\atlmfc\include" /I"c:\Program Files\Microsoft Visual Studio 8\VC\include" /I"c:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\include" /I"c:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\include" /I"C:\Program Files\VNI\imsl\fnl600\IA32\include\static" /Qopenmp /module:"Release\" /object:"Release\" /libs:dll /threads /c
0 Kudos
Martyn_C_Intel
Employee
903 Views
I agree, if you can keep two cores busy on your laptop, you should be able to get at least two cores worth of work out of your other machine. Something doesn't seem right.

First comment, you would not normally want touse both /Qopenmp and /Qparallel together. Are you making use of IMSL, and is that why you have /Qopenmp? Are you calling parts of IMSL or MKL that are threaded? Does the parallelism that you see on your laptop come from /Qparallel or from /Qopenmp? (Eg, does it change when you remove /Qparallel ?)

You might collect some additional information about the OpenMP environments on the two systems by setting these environment variables:
KMP_VERSION=1
KMP_SETTINGS=1
and comparing the logs.

Are these significantly different versions of the Windows OS?

A next step might be to run a little test program, such as a standalone parallel matrix multiply, on both your laptop and your 6 core system, and see whether that scales as expected.
0 Kudos
Reply