- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My PC is a Dell Precision T5400, using a quad-core Xeon E5430 CPU @2.66 GHz with 2 GB RAM. The operating system is Windows XP Pro SP2. The options used for compiling are: /fpp /noautomatic /Qzero /O3 /Qparallel /QxT /QaxT.
I suspect I'm missing something, since the CPU utilization, while the code is running, is always at 25% with the IDLE process at 75%. The "affinity" parameter in the task manager, for the running code, shows a checkmark for all 4 CPUs. Is there any additional compiler switch to let the code increase the CPU utilization?
Maybe my compiler switches are wrong for a quad-core E5430 CPU. The "Quick-Reference Guide to Optimization with Intel Compilers" (http://software.intel.com/file/1776) seems to contradict the page "Intel compiler options for SSE generation and processor-specific optimizations" (http://support.intel.com/support/performancetools/sb/CS-009787.htm):
Quad-Core Intel Xeon processors /QxT /QaxT (the former, page 11)
Quad-Core Intel Xeon 54XX, 33XX series /QxS (the latter)
Or am I missing something at the operating system level?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/Qparallel is very cautious and rarely results in optimum parallelization. To do better you should use OpenMP, adding appropriate directives and determining which variables should be shared, private, etc. It is not as simple as throwing a switch and getting a 4X speedup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mzm,
Your legacy Monte Carlo simulation code is likely written as a single threaded application as opposed to multi-threaded. Often, one requirement of simulation programs are repeatability. If your requirements are for repeatability then you might not be able to multi-thread the application as the sequence of execution is harder to control. If you do not require 100% repeatability then you can multi-thread the application.
If your application currently is not multi-threaded consult the OpenMP section of your documentation.
If your application is threaded (or when you read the OpenMP documentationand convert your application to multi-threaded) you should be aware that some library functions use critical sections to permit only one thread through at a time. You can clearly see aWRITE statement should perform as a single operation and not blend the data with a WRITE being performed by a different thread of the application at the same time.
For your Monte Carlo simulation you are likely calling a not so obvious serialized function, one of the random number generator functions. For multi-threaded applications relying heavily on random numbers it is more efficient to have each thread use RANDOM_NUMBER to collect a pool (harvest) of random numbers and then each thread to work off its private pool of numbers (and repopulate the pool as necessary). In this manner the serialization is performed once per call to obtain a pool as opposed to once per random number.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page