- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm wondering why I don't find documentation on KMP_AFFINITY=physical, which appears to be the favored setting for HyperThreading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm wondering why I don't find documentation on KMP_AFFINITY=physical, which appears to be the favored setting for HyperThreading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with your implication that failing to support affinity mask in a similar way on Intel and AMD platforms would be a serious deficiency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
MKL User's Guide has a section with examples on setting affinity mask by means of operating system. The section should be named like "Managing Performance and Memory>Tips and Techniques to Improve Performance>Managing Multi-Core Performance". Have in mind that affinity mask is per-thread attribute (on Linux, at least), so it should be set after the top level OpenMP threads are initiated.
Hope this helps
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
MKL User's Guide has a section with examples on setting affinity mask by means of operating system. The section should be named like "Managing Performance and Memory>Tips and Techniques to Improve Performance>Managing Multi-Core Performance". Have in mind that affinity mask is per-thread attribute (on Linux, at least), so it should be set after the top level OpenMP threads are initiated.
Hope this helps
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page