- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to parallelize a (very big) sparse matrix-vector multiplication on a Dual Intel P4 Xeon. I tried the Qparallel option and the OpenMP directives : in both cases, it takes longer than using one CPU ...
I've read that a poor cache utilization (or a memory bandwith limitation) could be the cause. Is that true ? Can some third-party products (like NAG library) help ?
Damien Veyret
I've read that a poor cache utilization (or a memory bandwith limitation) could be the cause. Is that true ? Can some third-party products (like NAG library) help ?
Damien Veyret
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is not an area I am an expert in, but I'd think it would be an ideal application for the Intel VTune performance analyzer. Rather than guessing, use VTune to see where your program is spending its time and let it make suggestions for improvements. If you haven't tried it, you should - it's quite impressive! There's a 30-day free trial. You may also want to see if the Intel Performance Libraries can be of help.
http://developer.intel.com/software/products/
Steve
http://developer.intel.com/software/products/
Steve

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page