Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Intel Duo Core vs. Quad Core performance question

jta23
Beginner
646 Views
(Note: I also posted this question to the
forum days ago, but didn't receive any responses.)
I'm comparing performance of the multithreaded server software I'm developing (in Java) on two machines and three OSes and finding some puzzling results. Note: although the server is multithreaded, the bulk of the computation is currently handled by a single thread. Still doing research/dev on parallelizing the problem.


Computer #1
Intel Core 2 Duo (T7500)
2GB RAM
4M Cache, 2.20 GHz, 800 MHz FSB
Mac OS X 10.6
Java 1.6u17 (Java HotSpot 64-Bit Server VM)

Computer #2
Intel Core 2 Quad (Q8300)
8GB RAM
4M Cache, 2.50 GHz, 1333 MHz FSB
Ubuntu 9.10orWindows 7
Java 1.6u17 (Java HotSpot 64-Bit Server VM) on Windows
Java 1.6u17, 1.6u19, and 1.7 beta on Linux


Here's a screenshot from the Intel website:


Processor comparison





Results:
Surprisingly (to me), my program runs significantly faster on Computer #1 than on the Computer #2 (when running either Linux or Windows). The latter runs at 60%-75% of the speed as the former.

I'm not quite understanding this as Computer #1 is 3 years old and has a slower clock speed and system bus than #2 (which I literallyjustpurchased.)

Experiments:
-tried testing with both Java 1.6u19 and a 1.7 beta on Ubuntu, saw improvements of up to ~10%
-disabled 2 cores in Windows 7 so that's it's duo vs. duo instead of duo vs. quad; no (significant) difference
-changed the Ubuntu processor setting to "Performance" instead of a power saving/efficiency mode, no difference
-tried -Xconcurrentio Java option, no difference


I reckon I'll have to go into profiling the code to shed some more light on this, but thought I'd ask if this makes sense.
I'm not comparing apples and oranges, am I? Of course when the problem is more parallelized I should be able to get more performance out of the quad core, but I don't understand why it's this much slower now.


In the screenshot above, #1has a bus/core ratio of 11 while #2 has 7.5.
7.5/11 = 68%, just a coincidence?
Any insight is appreciated, thanks!
0 Kudos
7 Replies
Dmitry_Vyukov
Valued Contributor I
646 Views
> -disabled 2 cores in Windows 7 so that's it's duo vs. duo instead of duo vs. quad; no (significant) difference

How did you do that? By setting up the parameter in boot.ini?
0 Kudos
jta23
Beginner
646 Views
This page explains:
Note that to switch back, you uncheck the checkbox and restart.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
646 Views
Then I have no guess. You have basically 2 dual-core systems, and one is "better by all parameters", and it turns out to be significantly slower...
0 Kudos
Dmitry_Vyukov
Valued Contributor I
646 Views
If there will be no better suggestions, detailed comparation of profiles can help (as you already noted).
0 Kudos
Dmitry_Vyukov
Valued Contributor I
646 Views
Humm... well, perhaps... AFAIK Q8300 is not a real quad-core processor, it's a kind of 2 dual-core processors located on a single die and connected by a FSB. So communication cost between cores 0<->1 and 0<->2 significantly differs (the former is intra-core communication, while the latter is FSB communication) (that's what I observe on my Q6600). I suspect that when you restrict Windows to 2 cores, it chooses cores 0 and 2. If it's so, the difference between the two systems is clear: T7500 uses intra-core communication, while Q8300 uses FSB communication.
I guess JVM must have a command-line parameter to restrict it's execution to a subset of cores. You may try to restrict it to cores 0,1 and then 0,2, and check as to whether it's the root cause or not.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
646 Views
Can't find such a parameter. Perhaps you can try the following. Insert some pause into beginning of your program (before timing starts), then start the program, then quickly restrict the process affinity via Windows Task Manager.
0 Kudos
jta23
Beginner
646 Views
I'll try this, but before that I ran tests with a single core on both machines (setting maxcpus=1 in Linux, using /Library/Application Support/CPUPalette.app in OS X) and it still runs at only 70% speed on the quad core PC.


That's suggests that this isn't just a core communication issue, although I'm sure that has an effect as well when utilizing all 4 cores.
0 Kudos
Reply