Re: force mpi thread on physical cpu

lspes · ‎10-28-2005

I would like to know how to force a thread to be executed on a physical cpu in case of a dual-cpu with hyperthreading.

Hyperthreading should be very usefull but I guess that if I launch a program with mpi on a dual processor, I have two threads running on the cpu. In order to use the ht properly, I would like to get the two threads running only on physical cpu even if the treads can be balance between the logical cpu.

What I want to avoid is to get the two threads running on two logical cpu belonging to the same physical processor

jim_dempsey · ‎10-28-2005

Using Google search for

set thread affinity site:microsoft.com

One of the early links get you to

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/setthreadaffinitymask.asp

There are other topics related to threads.

Remeber the function call takes the Windows thread handle and not the OpenMP team thread number.

Jim Dempsey

TimP · ‎10-29-2005

The problem of writing a scheduler which optimizes HyperThreading was one of the first motivations for improved scheduling in the 2.5 and 2.6 linux kernels. When starting up a thread, an idle physical processor is preferred over a logical processor whose sibling is already busy. If there is no physical processor idle, a logical processor on the same physical processor where the thread ran last is preferred, in the hope of optimizing cache use.

Dual core introduced an intermediate level which schedulers must take into account, requiring corresponding information from the BIOS. When combined with HyperThreading, it makes all these demands on the scheduler even on a single package system.

The usual problem is to spread the work evenly across separate physical CPUs, and then across separate cores, before using multiple HyperThread logical processors on the same CPU. Red Hat EL3_U2, SuSE 9.3, and equivalent linux distros introduce multi-core aware scheduling which does that often enough to show measurable advantage over older schedulers.
It may happen that MS MPI will incorporate some kind of dual core aware scheduling, since it doesn't appear to be coming any time soon in the Windows scheduler.

Message Edited by tim18 on 10-28-2005 06:46 PM

jim_dempsey · ‎10-29-2005

From what I can interpret, HT suffers from a cache that was designed for a single (virtual) processor. If more effort is put into the cache design to eliminate aliasing of addresses then most of the adverse cache interaction would be eliminated (but there will undoubtably be a second most adverse cache interaction). Other than for dual cores or multi-cores, or multiple chips the cache interaction is likely to remain (why put the effort into fixing an old design).

This brings me to the question that someone might be able to answer. On a single core with HT can the cache-ing be disabled for one of the virtual processors? Let one thread run slower (but not trash the other thread's cache).

Jim Dempsey

lspes · ‎10-31-2005

Thanks a lot for your answers.

So If I fully understand to get the best usage of hyperthreading, I should use a linux kernel 2.5 or 2.6 in which the scheduler is optimized for HT.

Again, If I understand, with a 2.6 kernel, one a dual smp processor ht, when I run a mpi program on the two processor, the calculations is spread in two threads, the threads are launched on each physical cpu and only switch between the logical cpu belonging to a physical ones and if things going well at any time, we have the two threads of calculations running on 2 logical cpu belonging to only one physical.

I don't know if it is really clear but what I really want to avoid when using HT on a dual cpu box is to get the two threads of a calculations running one the same physical cpu while the second physical cpu is free and only occpuied by systems tasks.

TimP · ‎10-31-2005

That additional effort has already been put into the cache design. In the recent steppings, "64k aliasing" has been eliminated, replace by 4M aliasing. That is, cache mapping conflicts occur only with cache lines whose addresses differ by multiples of 4M. This issue is peripheral to the question posed originally. There are multiple reasons why the scheduler must prefer to schedule first on separate packages (CPUs in Intel-speak), next on separate cores, and last to use both logical processors on a single core.

jim_dempsey · ‎10-31-2005

But unfortunately the FORTRAN allocate as well as the C/C++ do not permit entering a hint at perfered alignment restrictions. e.g. if your application determines it is on a single core HT capable system with 4MB aliasing then there is no means to have the allocate specify a preference to obtain memory from a particular 2MB alignedportion of memory. The C++ programmer has the means to correct for this by replacing the new handler but the Fortran programmer does not have this functionality.

Jim Dempsey