Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29417 Discussions

Too low CPU utilization in a quad-core environment

mzm
Beginner
571 Views
I'm using IVF 10.1.024 (ia32) for compiling a legacy Monte Carlo simulation code. Only one line had to be changed in order to adapt the code which previously was running under gcc-3.4.4. The speed improvement is a factor about 2.7 with respect to the gcc-compiled code.

My PC is a Dell Precision T5400, using a quad-core Xeon E5430 CPU @2.66 GHz with 2 GB RAM. The operating system is Windows XP Pro SP2. The options used for compiling are: /fpp /noautomatic /Qzero /O3 /Qparallel /QxT /QaxT.

I suspect I'm missing something, since the CPU utilization, while the code is running, is always at 25% with the IDLE process at 75%. The "affinity" parameter in the task manager, for the running code, shows a checkmark for all 4 CPUs. Is there any additional compiler switch to let the code increase the CPU utilization?

Maybe my compiler switches are wrong for a quad-core E5430 CPU. The "Quick-Reference Guide to Optimization with Intel Compilers" (http://software.intel.com/file/1776) seems to contradict the page "Intel compiler options for SSE generation and processor-specific optimizations" (http://support.intel.com/support/performancetools/sb/CS-009787.htm):
Quad-Core Intel Xeon processors /QxT /QaxT (the former, page 11)
Quad-Core Intel Xeon 54XX, 33XX series /QxS (the latter)

Or am I missing something at the operating system level?
0 Kudos
2 Replies
Steven_L_Intel1
Employee
571 Views
You are using /Qparallel in an attempt to get multithreading. This may or may not help based on the structure of your program. If auto-parallel found loops to parallelize, it will say so in informational messages. You should enable optimization reports to see which loops didn't parallelize and why, with the idea of possibly restructuring your code to make it easier for the compiler.

/Qparallel is very cautious and rarely results in optimum parallelization. To do better you should use OpenMP, adding appropriate directives and determining which variables should be shared, private, etc. It is not as simple as throwing a switch and getting a 4X speedup.
0 Kudos
jimdempseyatthecove
Honored Contributor III
571 Views

mzm,

Your legacy Monte Carlo simulation code is likely written as a single threaded application as opposed to multi-threaded. Often, one requirement of simulation programs are repeatability. If your requirements are for repeatability then you might not be able to multi-thread the application as the sequence of execution is harder to control. If you do not require 100% repeatability then you can multi-thread the application.

If your application currently is not multi-threaded consult the OpenMP section of your documentation.

If your application is threaded (or when you read the OpenMP documentationand convert your application to multi-threaded) you should be aware that some library functions use critical sections to permit only one thread through at a time. You can clearly see aWRITE statement should perform as a single operation and not blend the data with a WRITE being performed by a different thread of the application at the same time.

For your Monte Carlo simulation you are likely calling a not so obvious serialized function, one of the random number generator functions. For multi-threaded applications relying heavily on random numbers it is more efficient to have each thread use RANDOM_NUMBER to collect a pool (harvest) of random numbers and then each thread to work off its private pool of numbers (and repopulate the pool as necessary). In this manner the serialization is performed once per call to obtain a pool as opposed to once per random number.

Jim Dempsey

0 Kudos
Reply