Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

hyperthreading

pmariuszp
Beginner
1,551 Views
I am starting a lot of threads on one machine. Actually
I am wondering how can i run a thread on processor selected by me. I mean when I enable hyperthreading then i can see 8 processors instead of 2. And i want to run task on let say processor 2. Is it possible to do so?
0 Kudos
6 Replies
TimP
Honored Contributor III
1,551 Views
With hyper-threading on, on a 2 CPU computer, you should see 4 logical processors, not 8. With an HT-aware BIOS and operating system, the scheduler will automatically prefer an idle physical CPU when starting a new thread. In order to assign a thread to a specific logical processor, you might use a processor affinity function call. To go further with this, you'll need to be specific about OS, and, for linux, kernel version.
0 Kudos
ClayB
New Contributor I
1,551 Views
I'm in agreement with Henry on this. Trust the OS to find the best processors on which to place threads.

Of course, there are situations that some schedulers aren't (quite) equipped to handle which may require some intervention by the user to set the thread affinity within her application for better performance. As an example, see the problem with the Linux O(1) scheduler described in Affinity ber alles.
0 Kudos
bronx
Beginner
1,551 Views
> want to map threads to processors, Windows has the
> SetThreadAffinityMask function and the latest

under WIN32 SetThreadIdealProcessor is an interesting middle-ground between fully automatic scheduling and a "hard wired" affinity mask

on a SMP rig, when using a pool of threads along with a pool of heap allocators it's a good idea to ensure that each thread will stick most of the time with a given CPU using SetThreadIdealProcessor, though without a hard constraint. Threads can migrate to the other CPU when required instead of being stalled by higher priority transient tasks

after a bit of HT oriented optimizations very nice speedups can be expected, as our "Kribi" 3D engine has shown at the launch of P4 HT here :

HT scores
0 Kudos
pmariuszp
Beginner
1,551 Views
Yes, my mistake - i will have only 4 processors with multithreading.
I'll try setThreadAffinityMask.
Another thing is - if i set a processor for the thread, and then in this thread i will run different process - will this process run on the cpu i'v selected or it will select one randomly.
I'v found that there is also setProcessAffinityMask. I think i will have to check it out...
0 Kudos
Aaron_C_Intel
Employee
1,551 Views

> on a SMP rig, when using a pool of threads along with
> a pool of heap allocators it's a good idea to ensure
> that each thread will stick most of the time with a
> given CPU using SetThreadIdealProcessor, though
> without a hard constraint. Threads can migrate to the
> other CPU when required instead of being stalled by
> higher priority transient tasks
>
Hmm interesting, did you see a noticable performance difference when using SetThreadIdealProcessor? I would have though the OS would try to do this in general, because I know they track the last processor it was on.

Actually a potentially useful thing to do with HT in particular would be to use thread affinity mask to limit a thread one physical processor. This would be another hybrid approach. You would like your thread to remain on the same physical processor for cache reasons, etc. But if it switches logical processors thats okay.

Of course this only applies to multi-processor systems with HT.

Thoughts?
0 Kudos
bronx
Beginner
1,551 Views
> Hmm interesting, did you see a noticable performance
> difference when using SetThreadIdealProcessor? I

from my project history it was originally introduced along with pools of memory allocator, still launching a lot of threads with short lifespan. Each thread was attached to a prefered allocator, so it was very important to tie each thread sharing a given allocator to the same processor, to maximize cache usage and minimize serialized access penalty. The speedups were quite noticeable (more than 10% speedup IIRC) at the time with dual-PII 400 under NT4.0.


> would have though the OS would try to do this in
> general, because I know they track the last processor
> it was on.

with the later addition of pools of thread, using the set ideal proc. call is probably less important since only one thread will use each allocator during it's whole (very long) lifespan, I still have the call in the code but no speedup figure to provide


> to remain on the same physical processor for cache
> reasons, etc. But if it switches logical processors

indeed it's an advantage of the affinity mask calls : thanks to the bitset mask they are more versatile than set ideal proc. with its single cpu#

btw, for dynamic mem alloc heavy code, one important bottleneck is the synchronized access to each shared allocators, here it's best to have one allocator per logical CPU instead than one per CPU package


> Of course this only applies to multi-processor
> systems with HT.

sure + forthcoming SMT/CMP combos
0 Kudos
Reply