forum days ago, but didn't receive any responses.)
I'm comparing performance of the multithreaded server software I'm developing (in Java) on two machines and three OSes and finding some puzzling results. Note: although the server is multithreaded, the bulk of the computation is currently handled by a single thread. Still doing research/dev on parallelizing the problem.
Computer #1 Intel Core 2 Duo (T7500) 2GB RAM 4M Cache, 2.20 GHz, 800 MHz FSB Mac OS X 10.6 Java 1.6u17 (Java HotSpot 64-Bit Server VM)
Computer #2 Intel Core 2 Quad (Q8300) 8GB RAM 4M Cache, 2.50 GHz, 1333 MHz FSB Ubuntu 9.10orWindows 7 Java 1.6u17 (Java HotSpot 64-Bit Server VM) on Windows
Java 1.6u17, 1.6u19, and 1.7 beta on Linux
Here's a screenshot from the Intel website:
Results: Surprisingly (to me), my program runs significantly faster on Computer #1 than on the Computer #2 (when running either Linux or Windows). The latter runs at 60%-75% of the speed as the former.
I'm not quite understanding this as Computer #1 is 3 years old and has a slower clock speed and system bus than #2 (which I literallyjustpurchased.)
Experiments: -tried testing with both Java 1.6u19 and a 1.7 beta on Ubuntu, saw improvements of up to ~10% -disabled 2 cores in Windows 7 so that's it's duo vs. duo instead of duo vs. quad; no (significant) difference -changed the Ubuntu processor setting to "Performance" instead of a power saving/efficiency mode, no difference -tried -Xconcurrentio Java option, no difference
I reckon I'll have to go into profiling the code to shed some more light on this, but thought I'd ask if this makes sense.
I'm not comparing apples and oranges, am I? Of course when the problem is more parallelized I should be able to get more performance out of the quad core, but I don't understand why it's this much slower now.
In the screenshot above, #1has a bus/core ratio of 11 while #2 has 7.5. 7.5/11 = 68%, just a coincidence?
Humm... well, perhaps... AFAIK Q8300 is not a real quad-core processor, it's a kind of 2 dual-core processors located on a single die and connected by a FSB. So communication cost between cores 0<->1 and 0<->2 significantly differs (the former is intra-core communication, while the latter is FSB communication) (that's what I observe on my Q6600). I suspect that when you restrict Windows to 2 cores, it chooses cores 0 and 2. If it's so, the difference between the two systems is clear: T7500 uses intra-core communication, while Q8300 uses FSB communication. I guess JVM must have a command-line parameter to restrict it's execution to a subset of cores. You may try to restrict it to cores 0,1 and then 0,2, and check as to whether it's the root cause or not.
Can't find such a parameter. Perhaps you can try the following. Insert some pause into beginning of your program (before timing starts), then start the program, then quickly restrict the process affinity via Windows Task Manager.
I'll try this, but before that I ran tests with a single core on both machines (setting maxcpus=1 in Linux, using /Library/Application Support/CPUPalette.app in OS X) and it still runs at only 70% speed on the quad core PC.
That's suggests that this isn't just a core communication issue, although I'm sure that has an effect as well when utilizing all 4 cores.