Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Memory performance on Q6600

arugula
Beginner
593 Views
Hello, I am using a test setup that consists of the following

Intel db975X2 motherboard.
Intel e6600 and/or q6600
4 GB of cl-5-5-5-15 ddr2 667 ram.
200 GB hdd
ATI radeon.
Windows XP pro

I am getting some odd results when testing the memory bandwidth / CPU.

When I use a core duo, (2 cores) I get the same performance on each core.

on the quad, I get on core 0 about 80% of the speed of core 1 and 3. On core 2 I get about the same speed as 0. When I multi-thread I find that 1 and 3 share a bus, and 0 and 2. Is such a speed difference normal?

I managed to reproduce these results with Rightmark memory analyzer. A nifty open source benchmark.
0 Kudos
5 Replies
TimP
Honored Contributor III
593 Views
You seem to be using some of your terminology loosely. On Core 2 Quad, cores 0 and 2 would share an L2 cache, and cores 1 and 3 would share the other. The 2 L2 caches share a single memory bus.
If you have EIST enabled, it could make for difficulty in getting consistent results. I haven't seen any version of Windows running on Core 2 quad, but I would expect some issues to be specific to each Windows version.
0 Kudos
arugula
Beginner
593 Views
Thank you for the precision.

I have disabled EIST. I profiled the system a bit with rightmark Memory analyzer. Here are the results (in gb/s)

Cpu state
Speed (gb/s)

Total (gb/s)
Core 2 Duo








Core 0 Core 1


Core 0 Core 1



OFF ON


0.0 5.5


5.5
ON OFF


5.5 0.0


5.5
ON ON


3.1 2.9


6.0
Core 2 Quad








Core 0 Core 1 Core 2 Core 3
Core 0 Core 1 Core 2 Core 3

OFF OFF OFF ON
0.0 0.0 0.0 5.5
5.5
OFF OFF ON OFF
0.0 0.0 4.5 0.0
4.5
OFF OFF ON ON
0.0 0.0 2.9 3.0
5.9
OFF ON OFF OFF
0.0 5.5 0.0 0.0
5.5
OFF ON OFF ON
0.0 3.0 0.0 3.0
6.0
OFF ON ON OFF
0.0 3.0 2.9 0.0
5.9
OFF ON ON ON
0.0 1.5 2.5 1.5
5.5
ON OFF OFF OFF
4.5 0.0 0.0 0.0
4.5
ON OFF OFF ON
2.9 0.0 0.0 3.0
5.9
ON OFF ON OFF
2.5 0.0 2.5 0.0
5.0
ON OFF ON ON
1.5 0.0 1.4 2.5
5.4
ON ON OFF OFF
2.9 3.0 0.0 0.0
5.9
ON ON OFF ON
2.8 1.6 0.0 1.7
6.1
ON ON ON OFF
1.7 2.8 1.7 0.0
6.2
ON ON ON ON
1.5 1.5 1.5 1.5
6.0

I am pretty sure I am hitting the bandwidth limit but the kicker is that "cpu0" has a different throughput than "cpu1" All other synthetic tests show the cores to be equal.
0 Kudos
jimdempseyatthecove
Honored Contributor III
593 Views

Have you verified that something else running on the system is not affecting the test results?

The O/S might favor one of the cores for some of the services. Favoring one of the cores would knock down the performance on it's partner coreas well. Something as innocous as refreshing the video adapter could cause similar anomalies as well. Try running your test while the system is in a full screen console window (i.e. VGA text mode using Alt-Enter). And make sure the test is not blasting out results data to the screen.

Jim Dempsey

0 Kudos
arugula
Beginner
593 Views
An excellent point,

The test plots out an average value in deffered time. (not while the test is runnign) I can interpret it as I see fit. It looks like a flat line with some dips which I blame on windows events and whatnot. The kicker is that I am comparing the performance of a core 2 quad to a core 2 duo. Both CPUs are running at 2.4 GHz on the same motherboard with the same software and the same hardware.. If one cpu glitches, so should the other. But on a core duo, both CPUs can sustain approx 5.5 GB/s transfers to ram, whereas on the quad, two CPUs can keep up 5.5 GB/s and 2 can "only" do 4.5 GB/s.
0 Kudos
jimdempseyatthecove
Honored Contributor III
593 Views

From a cursory look at your test results it would appear that the memory band width for both systems is ~6.0 GB/s.

The fact that you see 5.5 GB/s on the Core 2 Duo indicates that there is a 0.5 GB/s "drain" on the system. I use drain here to indicate whatever it is that is interfereing with memory access.

On the Quad, it appears that this drain affects cores 0/2. As you can see from your charts, when more than one core is active that no one core attains 5.5 GB/s or even 4.5 GB/s as this is due to contending cores waiting for their turn at the memory controller. The drain on 0/2 is likely causing a statical loss of being first at the memory controller during an otherwise tie-ing situation. i.e. cores 0/2 statistacally loose out on average what would otherwise be a tie-ing race condition.

Someone at Intel probably has written a white paper on this.

Jim Dempsey

0 Kudos
Reply