- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
I'm wondering why I don't find documentation on KMP_AFFINITY=physical, which appears to be the favored setting for HyperThreading.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
I'm wondering why I don't find documentation on KMP_AFFINITY=physical, which appears to be the favored setting for HyperThreading.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
I agree with your implication that failing to support affinity mask in a similar way on Intel and AMD platforms would be a serious deficiency.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hello,
MKL User's Guide has a section with examples on setting affinity mask by means of operating system. The section should be named like "Managing Performance and Memory>Tips and Techniques to Improve Performance>Managing Multi-Core Performance". Have in mind that affinity mask is per-thread attribute (on Linux, at least), so it should be set after the top level OpenMP threads are initiated.
Hope this helps
Thanks
Dima
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hello,
MKL User's Guide has a section with examples on setting affinity mask by means of operating system. The section should be named like "Managing Performance and Memory>Tips and Techniques to Improve Performance>Managing Multi-Core Performance". Have in mind that affinity mask is per-thread attribute (on Linux, at least), so it should be set after the top level OpenMP threads are initiated.
Hope this helps
Thanks
Dima
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi styc,
The instructions in the MKL User's Guide seem to be incomplete. The code snippet in the MKL User's Guide is apparently missing correct thread identification: instead of getpid() one should use syscall(SYS_gettid). Another issue is thatOpenMP layer appliesin terms of OpenMP threads while theyare dynamically mapped toOS threads. This issue can be worked around by settingenvvar KMP_AFFINITY=disabled (seeThread Affinity Interface) - this may have perfromance implications though, I don't know.
In summary, could you try this function for binding current thread to cpus?
// Handle up to 32 cpus
void bind_me_to(unsigned cpumask)
{
cpu_set_t mask;
pid_t tid = syscall(SYS_gettid);
int cpuid;
CPU_ZERO(&mask);
for (cpuid=0; cpuid < 32; cpuid++)
{
if (cpumask & (1<
}
sched_setaffinity(tid, sizeof(mask), &mask);
}
This function is assumed to be called in the following setup, ifI understood you correctly (ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions):
#pragma omp parallel default(shared) num_threads(2)
{
int omp_tid = omp_get_thread_num();
omp_set_nested(1); // nested parallel regions should be enabled
if (omp_tid==0)
{
bind_me_to(0x0f); // four threads on one socket
omp_set_num_threads(4);
do_dgemm();
}
if (omp_tid==1)
{
bind_me_to(0xf0); // four threads on another socket
omp_set_num_threads(4);
do_fft();
}
}
I hope this will help
Thanks
Dima
- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable