- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
recently I tested a new server from Dell (PowerEdge 2850) and I observed a strange performance degradation over an older system we have (Dell's PE 2650).
The newer system has 2*3.0 GH Intel Xeon Foster processors, each with 16KB of L1 cache and 1MB of L2 unified cache as well as 2GB of DDR2 RAM running at 800MHz. The x86info identifies cpu as
"Family: 15 Model: 4 Stepping: 1 Type: 0 Brand: 0"
The older system has 2*2.80 GHz Pentium 4 (Northwood) [C1] with 8KB L1 and 512KB L2 cache and 2GB of DDR RAM running at 400Mhz. The x86info says
Family: 15 Model: 2 Stepping: 7 Type: 0 Brand: 11
Both system have latest BIOS, and run under Linux with hyperthreading enabled. The older system runs "Debian/unstable" with kernel 2.6.10-686-smp and libc-i686 library, and the newer system was tried with several distributions, mainly RedHat based.
The puzzle is that our software performs _noticeably_ better on the older system. The software (sequence alignment using dynamic programming) is quite memory intensive and mostly uses integer computations. The difference is such that we cannot say it is a measurement errors. In some cases the system was twice as slow.
The problem can also be reproduced with a public tool/compiler. For example, the FASTA package (ftp://ftp.virginia.edu/pub/fasta/fasta34t25b3.shar.Z) compiled with gcc 3.3.5, flags "-O3 -march=pentium4" shows the following results:
[old] /usr/bin/time fasta/ssearch34 -b 50 -d 0 -H -Q -O 1 HSBA150A6 HSBA150A6
301.96user 0.12system 5:02.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+11309minor)pagefaults 0swaps
[new] /usr/bin/time fasta/ssearch34 -b 50 -d 0 -H -Q -O 1 HSBA150A6 HSBA150A6
374.65user 0.07system 6:14.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (179major+11121minor)pagefaults 0swaps
here HSBA150A6 is a nucleic sequence (97392 bases) from GenBank144.
As you can see, the newer system is ~20% slower, despite that it has a faster CPU, more cache and faster memory.
It must be said that we couldn't run the test on identical OSes, but we tried to run the older system with kernel 2.4.27 and it was still noticeably faster.
We noticed that the PE2850 has a setting in the bios related to tune-up for applications that access memory sequentially/randomly. It has no effect on our tests.
I would appreciate any ideas why the seemingly better processor/memory in fact performs worse in practice?
Vladimir
recently I tested a new server from Dell (PowerEdge 2850) and I observed a strange performance degradation over an older system we have (Dell's PE 2650).
The newer system has 2*3.0 GH Intel Xeon Foster processors, each with 16KB of L1 cache and 1MB of L2 unified cache as well as 2GB of DDR2 RAM running at 800MHz. The x86info identifies cpu as
"Family: 15 Model: 4 Stepping: 1 Type: 0 Brand: 0"
The older system has 2*2.80 GHz Pentium 4 (Northwood) [C1] with 8KB L1 and 512KB L2 cache and 2GB of DDR RAM running at 400Mhz. The x86info says
Family: 15 Model: 2 Stepping: 7 Type: 0 Brand: 11
Both system have latest BIOS, and run under Linux with hyperthreading enabled. The older system runs "Debian/unstable" with kernel 2.6.10-686-smp and libc-i686 library, and the newer system was tried with several distributions, mainly RedHat based.
The puzzle is that our software performs _noticeably_ better on the older system. The software (sequence alignment using dynamic programming) is quite memory intensive and mostly uses integer computations. The difference is such that we cannot say it is a measurement errors. In some cases the system was twice as slow.
The problem can also be reproduced with a public tool/compiler. For example, the FASTA package (ftp://ftp.virginia.edu/pub/fasta/fasta34t25b3.shar.Z) compiled with gcc 3.3.5, flags "-O3 -march=pentium4" shows the following results:
[old] /usr/bin/time fasta/ssearch34 -b 50 -d 0 -H -Q -O 1 HSBA150A6 HSBA150A6
301.96user 0.12system 5:02.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+11309minor)pagefaults 0swaps
[new] /usr/bin/time fasta/ssearch34 -b 50 -d 0 -H -Q -O 1 HSBA150A6 HSBA150A6
374.65user 0.07system 6:14.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (179major+11121minor)pagefaults 0swaps
here HSBA150A6 is a nucleic sequence (97392 bases) from GenBank144.
As you can see, the newer system is ~20% slower, despite that it has a faster CPU, more cache and faster memory.
It must be said that we couldn't run the test on identical OSes, but we tried to run the older system with kernel 2.4.27 and it was still noticeably faster.
We noticed that the PE2850 has a setting in the bios related to tune-up for applications that access memory sequentially/randomly. It has no effect on our tests.
I would appreciate any ideas why the seemingly better processor/memory in fact performs worse in practice?
Vladimir
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Foster was a very early Xeon model. From your description, I'm guessing you're running a Nocona, maybe with the 32-bit OS. You don't say whether you are running a threaded application, or whether you have HT enabled on one or both machines. RH EL3 doesn't do a good job of scheduling on a Xeon with HT enabled. I haven't had a chance to see whether EL4 may be better.
If you do have a Nocona, your increased clock speed ought to offset the longer pipelines. If you are using gcc, you should use a version with the -march=nocona switch. You may require oprofile or Vtune to get useful information about performance events.
If you do have a Nocona, your increased clock speed ought to offset the longer pipelines. If you are using gcc, you should use a version with the -march=nocona switch. You may require oprofile or Vtune to get useful information about performance events.
![](/skins/images/54BF544B471F3F61DFD338F1D58F9426/responsive_peak/images/icon_anonymous_message.png)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page