Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1695 Discussions

Strange perfomance variations on Xeon with hyperthreading (MMX/SSE2)

I have optimized implementation of string alignment algorithm. I am using SSE2 intructions heavily that gives me on average 7 fold speedup. There is about 32 kb read only buffer thatis shared among multiple threads. Each thread requires individual read/write bufferof at most128Kb. Each thread also uses individual read only buffer with the size of about 64Kb.
Now, the numbers I get:
Higher number better (linear to the time)
Single Xeon 2.8 box with HT enabled:
single threadperfomance 312
two thread perfomance: 604
Dual Xeon 2.8 box with HT enabled:
single thread perfomance: 295
dual thread perfomance: 500
three threads: 445
four threads: 390
Single P4 3.0 Box 2Mb cache with HT delivers:
single threadperfomance: 295
two thread perfomance: 500
four thread pefomance: 460
My 1.7 Centrino laptop:
single thread: 439
two threads: 425
three threads:425
four threads:410
Is there good explanation?
I am buffled. Something to do with cache?
Please help me to understand it.

Message Edited by chum on 11-28-2005 11:20 PM

Message Edited by chum on 11-28-2005 11:23 PM

0 Kudos
0 Replies