- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using Intel MKL routine zgemm() to multiply two complex matrices
on a 2-core processor machine with a clock speed of 2.79 GHz
When I run the program with no OMP_NUM_THREADS and KMP_AFFINITY not set, I am getting approximately 2700 MLFOPS. When I set OMP_NUM_THREADS=2 and set KMP_AFFINITY= (null), my program's FLOPS go down to 1390 MFLOPS. When unset KMP_AFFINITY FLOP rate goes down even further to 1000 MFLOPS.
Why is the single thread code running better than when I specify two threads?
TIA
When I run the program with no OMP_NUM_THREADS and KMP_AFFINITY not set, I am getting approximately 2700 MLFOPS. When I set OMP_NUM_THREADS=2 and set KMP_AFFINITY= (null), my program's FLOPS go down to 1390 MFLOPS. When unset KMP_AFFINITY FLOP rate goes down even further to 1000 MFLOPS.
Why is the single thread code running better than when I specify two threads?
TIA
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page