Link Copied
There is the GetCurrentProcessorNumber() system call on Windows systems.
And vgetcpu() on Linux systems.
If/when they are not present, you can use LSL instruction:
[cpp]__declspec(naked) unsigned get_current_proc() { __asm { mov ecx, 03Bh lsl eax, ecx shr eax, 0Eh retn } } [/cpp]Or if it does not work on your system - SIDT instruction:
[cpp]unsigned* idt_table; unsigned idt_table_size; __declspec(thread) unsigned thread_cache_idt; __declspec(thread) unsigned thread_cache_proc; unsigned get_current_processor() { #pragma pack(push, 1) struct idt_t { unsigned short size; unsigned base; }; #pragma pack(pop) idt_t idt; __sidt(&idt); if (idt.base != thread_cache_idt) { for (unsigned i = 0; i != idt_table_size; ++i) { if (idt_table == idt.base) { thread_cache_idt = idt.base; thread_cache_proc = i; break; } } } return thread_cache_proc; } [/cpp]
,
I could not get either of the code snippets above to work on a Nehalem-EX machine (Intel64) running Red Hat EL 5.3 and the Intel compilers.
For the LSL instruction code, I just get a zero return value no matter where the threads are running. I can't get the SIDT instruction code to compile:
> icpc -openmp get_proc.c
get_proc.c(34): error: identifier "__sidt" is undefined
__sidt(&idt);
Is there a special header, compiler, or architecture I have to use to recognize this SIDT instruction (which I assume is assembly language).
[cpp]+/* Fast way to get current CPU and node. + This helps to do per node and per CPU caches in user space. + The result is not guaranteed without CPU affinity, but usually + works out because the scheduler tries to keep a thread on the same + CPU. + + tcache must point to a two element sized long array. + All arguments can be NULL. */ +long __vsyscall(2) +vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache) +{ + unsigned int p; + unsigned long j = 0; + + /* Fast cache - only recompute value once per jiffies and avoid + relatively costly lsl/sidt otherwise. + This works because the scheduler usually keeps the process + on the same CPU and this syscall doesn't guarantee its + results anyways. + We do this here because otherwise user space would do it on + its own in a likely inferior way (no access to jiffies). + If you don't like it pass NULL. */ + if (tcache && tcache->blob[0] == (j = __jiffies)) { + p = tcache->blob[1]; + } + else { +#ifdef VGETCPU_USE_SIDT + struct { + char pad[6]; /* avoid unaligned stores */ + u16 size; + u64 address; + } idt; + + asm("sidt %0" : "=m" (idt.size)); + p = idt.size - 0x1000; +#else + /* Load per CPU data from GDT */ + asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG)); +#endif + if (tcache) { + tcache->blob[0] = j; + tcache->blob[1] = p; + } + } + if (cpu) + *cpu = p >> CONFIG_NODES_SHIFT; + if (node) + *node = p & ((1<
For more complete information about compiler optimizations, see our Optimization Notice.