Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1689 Discussions

How to find which thread is running on which processor

I have so many confusions and problems...Please help

I want to find out which threads are running on which processor and which master thread created that thread in OpenMP VS 05.

How it is decided which thread will run which processor?

If we increase the number of threads using num_threads(10) on a 2 core processor and use num_threads(2) on a 7 core processor which would be faster and why ?

Any clue, please help. it is urgent
0 Kudos
7 Replies
Black Belt
To my knowledge, VCOMP doesn't provide a means for affinitizing thread to processor. It would have to be done by adding Windows thread library function call. That's among the advertised reasons for using the Intel OpenMP library, which includes its own implementation of all the VCOMP functions.
Normally, an efficiently parallelized application would not gain by setting more threads than number of cores. I don't know where you'd get a 7 core processor. It's certainly possible that a multi-core platform, particularly in the absence of suitable affinity setting, might not run 2 threads as well as a dual core platform.
It's been said that VS2010 includes more current OpenMP facilities than VS2005, and the contrary has also been stated.
Black Belt
>>I want to find out which threads are running on which processor

Unless threads are affinity pinned, they will migrate amonst processors. There are system API calls (not OpenMP calls) to inquire and/or set affinity restrictions. These system calls will vary from system to system. You can also use the processor dependent CPUID system function todetermine the current APIC geometry. The system though may map this to logical "cpu" in confusing ways.

>>which master thread created that thread in OpenMP

OpenMP uses the concept of thread team member numbers, where each team of n threads is numbered 0-n (for n+1 threads in team). With nested levels enabled, ny team member can instantiate a new team with those team members numbered 0-n (for n+1 threads in nestedteam). Therefore, with nested levels, multiple threads will read the same value for omp_get_thread_num(). To correct for this you can use thread local storage to hold a cardinal thread number (which you must initialize). An alternate, would be to pass the prior thread team member number into the next region. These numbers can then be appended to a nesting structure of your design (also in TLS) (n.m.x.y...)

Valued Contributor I
Quoting Anupam Dev

I want to find out which threads are running on which processor

There is the GetCurrentProcessorNumber() system call on Windows systems.

And vgetcpu() on Linux systems.

If/when they are not present, you can use LSL instruction:

[cpp]__declspec(naked) unsigned get_current_proc()
        mov ecx, 03Bh
        lsl eax, ecx
        shr eax, 0Eh
Or if it does not work on your system - SIDT instruction:

[cpp]unsigned* idt_table;
unsigned idt_table_size;

__declspec(thread) unsigned thread_cache_idt;
__declspec(thread) unsigned thread_cache_proc;

unsigned get_current_processor()
#pragma pack(push, 1)
    struct idt_t
        unsigned short size;
        unsigned base;
#pragma pack(pop)

    idt_t idt;
    if (idt.base != thread_cache_idt)
        for (unsigned i = 0; i != idt_table_size; ++i)
            if (idt_table == idt.base)
                thread_cache_idt = idt.base;
                thread_cache_proc = i;
    return thread_cache_proc;



I could not get either of the code snippets above to work on a Nehalem-EX machine (Intel64) running Red Hat EL 5.3 and the Intel compilers.

For the LSL instruction code, I just get a zero return value no matter where the threads are running. I can't get the SIDT instruction code to compile:

> icpc -openmp get_proc.c
get_proc.c(34): error: identifier "__sidt" is undefined

Is there a special header, compiler, or architecture I have to use to recognize this SIDT instruction (which I assume is assembly language).

Valued Contributor I
Hi Grant,

I guess all that stuff is indeed highly OS specific. I found LSL and SIDT tricks while tried to find a replacement for GetCurrentProcessorNumber() for pre-Vista versions of Windows. SIDT trick did work on Windows XP/Intel Q6600 for me. And LSL works only since Vista, so a bit senseless because GetCurrentProcessorNumber() is available.

Regarding __sidt(), it's Microsoft Visual C++ intrinsic. As far as I see, it's not available on Intel C++. You may try to use inline assembly to emit SIDT instruction, however no guarantees that it will work on Linux.

Valued Contributor I
It seems that SIDT must work on Linux.

[cpp]+/* Fast way to get current CPU and node.
+   This helps to do per node and per CPU caches in user space.
+   The result is not guaranteed without CPU affinity, but usually
+   works out because the scheduler tries to keep a thread on the same
+   CPU.
+   tcache must point to a two element sized long array.
+   All arguments can be NULL. */
+long __vsyscall(2)
+vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+       unsigned int p;
+       unsigned long j = 0;
+       /* Fast cache - only recompute value once per jiffies and avoid
+          relatively costly lsl/sidt otherwise.
+          This works because the scheduler usually keeps the process
+          on the same CPU and this syscall doesn't guarantee its
+          results anyways.
+          We do this here because otherwise user space would do it on
+          its own in a likely inferior way (no access to jiffies).
+          If you don't like it pass NULL. */
+       if (tcache && tcache->blob[0] == (j = __jiffies)) {
+               p = tcache->blob[1];
+       }
+       else {
+                struct {
+                        char pad[6];   /* avoid unaligned stores */
+                        u16 size;
+                        u64 address;
+                } idt;
+                asm("sidt %0" : "=m" (idt.size));
+                p = idt.size - 0x1000;
+               /* Load per CPU data from GDT */
+               asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+               if (tcache) {
+                       tcache->blob[0] = j;
+                       tcache->blob[1] = p;
+               }
+       }
+       if (cpu)
+               *cpu = p >> CONFIG_NODES_SHIFT;
+       if (node)
+               *node = p & ((1<
HEY, thanks so much, this post helped me out a lot, just what i was looking for!