Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28701 Discussions

Is performance of multiple linpack benchmarks running on different cores limited by a single FPU?

Tony_Garratt
Beginner
339 Views
I have set up a program to run the Fortranlinpack 1000x1000 benchmark test multiple times from the same executable. It is compiled with -O2 (and is not linked with MKL).

If I run the test on my 4-core Dell Intel PC (W3565 @ 3.20GHz) it takes around 16secs. If I fire off the test twicesimultaneously- each in a different DOS window - each test now takes around 27 seconds. My Task Manager gives the impression each is running on a different core.Since I have a 4-core machine, I was expecting each test to take 16seconds since each uses a different core. Any clues to why areeach istaking nearly twice as long? Is it because they are battling over a single FPU?
0 Kudos
3 Replies
TimP
Honored Contributor III
339 Views
More likely, they are competing for memory buss, if you don't have an optimized version which runs almost entirely in cache. Windows versions prior to Windows 7 didn't do a very good job of automatically keeping competing jobs on different cores. From your description I don't understand whether your 2 jobs are rotating among 4 cores, which wouldn't surprise me, although it's not optimum. If it's a Core 2 Quad, optimum would be to keep each job on its own L2 cache, although that may be difficult to arrange in Windows.
0 Kudos
jimdempseyatthecove
Honored Contributor III
339 Views
You have either

1) Memory bus contention issue
2) Cache issue

RE: memory bus issue

The W3565 has support for up to 3 memory channels. How many channelshas your configuration inyour system?
If you skimped on memory, then this might be your problem.

RE possible cach issue.

You have 8MB L3 cache. If your code is not sensitive to cache layout it may suffer with multiple cores. What does running two MKL linpack show? (assuming you can configure MKL for linpack)

Jim Dempsey
0 Kudos
Tony_Garratt
Beginner
339 Views
Thank you Tim and Jim. Your replies were right on the mark! I changed the benchmark to use a smaller matrix size so that the memory stays in the cache... and then I was able to run it simultaneously 4 times with hardly any worse performance than running it once.

Very helpful!

Thanks
Tony
0 Kudos
Reply