Strange perfomance variations on Xeon with hyperthreading (MMX/SSE2)

chum — Tue, 29 Nov 2005 15:12:08 GMT

I have optimized implementation of string alignment algorithm. I am using SSE2 intructions heavily that gives me on average 7 fold speedup. There is about 32 kb read only buffer thatis shared among multiple threads. Each thread requires individual read/write bufferof at most128Kb. Each thread also uses individual read only buffer with the size of about 64Kb.

Now, the numbers I get:

Higher number better (linear to the time)

Single Xeon 2.8 box with HT enabled:

single threadperfomance 312

two thread perfomance: 604

Dual Xeon 2.8 box with HT enabled:

single thread perfomance: 295

dual thread perfomance: 500

three threads: 445

four threads: 390

Single P4 3.0 Box 2Mb cache with HT delivers:

single threadperfomance: 295

two thread perfomance: 500

four thread pefomance: 460

My 1.7 Centrino laptop:

single thread: 439

two threads: 425

three threads:425

four threads:410

Is there good explanation?

I am buffled. Something to do with cache?
Please help me to understand it.

Message Edited by chum on 11-28-2005 11:20 PM

Message Edited by chum on 11-28-2005 11:23 PM

topic Strange perfomance variations on Xeon with hyperthreading (MMX/SSE2) in Intel® Moderncode for Parallel Architectures

Strange perfomance variations on Xeon with hyperthreading (MMX/SSE2)