- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have set up a program to run the Fortranlinpack 1000x1000 benchmark test multiple times from the same executable. It is compiled with -O2 (and is not linked with MKL).
If I run the test on my 4-core Dell Intel PC (W3565 @ 3.20GHz) it takes around 16secs. If I fire off the test twicesimultaneously- each in a different DOS window - each test now takes around 27 seconds. My Task Manager gives the impression each is running on a different core.Since I have a 4-core machine, I was expecting each test to take 16seconds since each uses a different core. Any clues to why areeach istaking nearly twice as long? Is it because they are battling over a single FPU?
If I run the test on my 4-core Dell Intel PC (W3565 @ 3.20GHz) it takes around 16secs. If I fire off the test twicesimultaneously- each in a different DOS window - each test now takes around 27 seconds. My Task Manager gives the impression each is running on a different core.Since I have a 4-core machine, I was expecting each test to take 16seconds since each uses a different core. Any clues to why areeach istaking nearly twice as long? Is it because they are battling over a single FPU?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
More likely, they are competing for memory buss, if you don't have an optimized version which runs almost entirely in cache. Windows versions prior to Windows 7 didn't do a very good job of automatically keeping competing jobs on different cores. From your description I don't understand whether your 2 jobs are rotating among 4 cores, which wouldn't surprise me, although it's not optimum. If it's a Core 2 Quad, optimum would be to keep each job on its own L2 cache, although that may be difficult to arrange in Windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have either
1) Memory bus contention issue
2) Cache issue
RE: memory bus issue
The W3565 has support for up to 3 memory channels. How many channelshas your configuration inyour system?
If you skimped on memory, then this might be your problem.
RE possible cach issue.
You have 8MB L3 cache. If your code is not sensitive to cache layout it may suffer with multiple cores. What does running two MKL linpack show? (assuming you can configure MKL for linpack)
Jim Dempsey
1) Memory bus contention issue
2) Cache issue
RE: memory bus issue
The W3565 has support for up to 3 memory channels. How many channelshas your configuration inyour system?
If you skimped on memory, then this might be your problem.
RE possible cach issue.
You have 8MB L3 cache. If your code is not sensitive to cache layout it may suffer with multiple cores. What does running two MKL linpack show? (assuming you can configure MKL for linpack)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Tim and Jim. Your replies were right on the mark! I changed the benchmark to use a smaller matrix size so that the memory stays in the cache... and then I was able to run it simultaneously 4 times with hardly any worse performance than running it once.
Very helpful!
Thanks
Tony
Very helpful!
Thanks
Tony
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page