Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Xeon X5680 slowdown using multithreading


Myxeon has 2 cpus each with 6 cores.
My application performs a cpu-intensive calculation on an image.
The application runs n threads - each with its own image (child buffer) of the same size for k iterations.
I noticed the more threads the higher the time it takes per thread.
I start with 0.83 ms per single runing solely thread and end up with 1.3 per thread with 12 threads.
Setting a thread per core using SetAffinityMask made no improvement.
Another problem rise when using high number of threads - the are a lot more andbigger fluctuations in the time per iteration.
The code itself is mostly sse4 code and the images are of 100X100X3 so there should not be any cache problem.

I would appreciate any idea...

0 Kudos
4 Replies
New Contributor I

Hello gilgil,

I'll move this thread to the Threading on Intel Parallel Architectures forum. I'm sure someone there will be able to answer.

Best regards,

Aubrey W.
Intel Software Network Support

0 Kudos
Honored Contributor III
Increased time per thread with increasing number of threads is normal, at least when multiple threads go through shared cache. The usual objective is to reduce elapsed time by using more cores.
You may be interested in checking whether the threads which share paths to Westmere cache (cores [0,1], [2,3]) may be less efficient than those for which you set affinity to a dedicated path.
0 Kudos
Valued Contributor I
Slowdown can be related to exhaustion of memory bandwidth. What is the size of images in bytes?
0 Kudos
Honored Contributor III
You might want to chart performance per thread using each possible thread count (1, 2, 3, 4, ... 12)

If you program is performing a large number of writes you may notice a plateau, or at least a drastic change in performance per thread (downwards stair step).

Also, this processor has turbo boost.
Meaning the fewer number of busy cores per CPU, the faster they run.
As you use more cores, (potentially when they get hot), then the boost is turned down or off.

Jim Dempsey
0 Kudos