- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have ifc 7.0 and I compiled some code using the -openmp and -parallel options in order to make use of my dual processor Xeon machine. I set OMP_NUM_THREADS to 4 because on "top" I see four cpu's. The results were: during execution, the user cpu is around 65% (I am assuming this is in the parallel regions) and distributed more or less equally among all four cpus. Occasionally, the usage goes to 100% , which is the usual case without any parallelization directives. The problem is this: Rather than having each cpu reading 15% I want them to read 100% (or even 50%) in order to get an improvement in my program's performance.
Does anyone know what to do? Should I switch to version 8 of the compiler?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are so many possibilities that you may want to go to the Threading forum, after you have gathered more data. Among the first things to look at:
Does your application become starved for RAM as you increase the number of threads?
Did your compilation report successful parallelization of all parts of your program where much time is spent? Use the openmp_report and parallel report switches.
Does your application balance load evenly among threads? If not, can the OpenMP scheduling options help? Seeing all 4 logical CPU's about equally loaded doesn't necessarily prove anything, a single thread could be hopping around excessively.
Diagnosing opportunities to gain more from OpenMP is notoften simple. Intel Threading Toolkit is one of the methods being undertaken to help.
On an HT system, getting the performance meter up to 100% doesn't necessarily show effective parallelization. You may need to shut HT off in the BIOS and check to see how effectively your application parallelizes on 2 CPU's. You must get it working well that way before you can hope for additional gain from turning on HT.
How much additional gain you could get from HT depends on many factors, such as:
which kernel
which Intel chip
how your program uses cache and Write Combine buffers
![](/skins/images/1FC86CD46823E418D1E2B7B4DC10231C/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page