Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Parallel performance - Dual P4 Xeon

damien_veyret7
Beginner
408 Views
I want to parallelize a (very big) sparse matrix-vector multiplication on a Dual Intel P4 Xeon. I tried the Qparallel option and the OpenMP directives : in both cases, it takes longer than using one CPU ...
I've read that a poor cache utilization (or a memory bandwith limitation) could be the cause. Is that true ? Can some third-party products (like NAG library) help ?

Damien Veyret
0 Kudos
1 Reply
Steven_L_Intel1
Employee
408 Views
This is not an area I am an expert in, but I'd think it would be an ideal application for the Intel VTune performance analyzer. Rather than guessing, use VTune to see where your program is spending its time and let it make suggestions for improvements. If you haven't tried it, you should - it's quite impressive! There's a 30-day free trial. You may also want to see if the Intel Performance Libraries can be of help.

http://developer.intel.com/software/products/

Steve
0 Kudos
Reply