- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if this is the right forum.
I'm a spare time Valgrind developer, and have also encountered this issue in my day job.
When using Intel OpenMP (from pstudioxe2017) I get a memcheck error
==14500== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==14500== at 0x21147E29: syscall (in /usr/lib64/libc-2.17.so)
==14500== by 0x206EB028: __kmp_affinity_determine_capable (in /path/to/lib/Linux_x86_64/libiomp5.so)
This happens during a call to omp_get_num_procs.
Using gdb, I see that the assembler looks like this. This is using 'syscall()' rather than glibc 'sched_setaffinity()'
│ 0x206eb019 <__kmp_affinity_determine_capable+73> xor %esi,%esi
│ 0x206eb01b <__kmp_affinity_determine_capable+75> mov $0xcb,%edi
│ 0x206eb020 <__kmp_affinity_determine_capable+80> xor %ecx,%ecx
│ 0x206eb022 <__kmp_affinity_determine_capable+82> xor %eax,%eax
│ 0x206eb024 <__kmp_affinity_determine_capable+84> call 0x20651ae0 <syscall@plt>
The arguments, in order, are
%edi is 0xcb (203), the syscall number.
%esi is the PID, zero
%edx is the length of the mask in bytes, which I see is 640. Normally it's supposed to be sizeof(cpu_set_t) which is 128.
%rcx is the pointer to the mask, and $ecx is set to 0
The glibc manpage doesn't mention the use of a NULL mask pointer, so I can't tell if this is some undocumented use of sched_setaffinity that memcheck isn't handling or whether it is a bug in Intel OpenMP.
Looking at the kernel source, I think that the excess map length just gets ignored:
https://elixir.bootlin.com/linux/v4.4/source/kernel/sched/core.c#L4488
else if (len > cpumask_size())
len = cpumask_size();
I'll do some more debugging to see if the syscall is failing.
EDIT: the return is -1, so it looks like an Intel OpenMP bug to me.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in the Intel forums.
Could you please try the supported version of the Intel oneAPI toolkit and let us know if you face a similar issue?
For more details regarding the supported version please refer to the below link
Could you please provide us with the below details?
1. OS
2. output of lscpu command
3. Sample reproducer and steps to reproduce the issue
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't easily try other versions. The most recent that we have installed is 2020. Asking for a more recent version to be installed is likely to take months. I can reproduce the problem with 2018 and 2020.4
OS - RHEL 7.9
lscpu - ntel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
GCC 11.2 built from source
(I don't think that any of the above change much)
Reproducer iomp_sched.c
#include <omp.h>
int main(void)
{
(void)omp_get_num_procs();
}
Commands:
gcc -fopenmp -c -g iomp_sched.c
gcc -o iomp_sched iomp_sched.o -L/path/to/pstudioxe2018/lib/intel64 -liomp5 -Wl,-rpath,/path/to/pstudioxe2018/lib/intel64
valgrind ./iomp_sched
==29136== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==29136== at 0x4EFFE29: syscall (in /usr/lib64/libc-2.17.so)
==29136== by 0x4B19197: __kmp_affinity_determine_capable (z_Linux_util.cpp:185)
==29136== by 0x4AF27E8: __kmp_env_initialize(char const*) (kmp_settings.cpp:5773)
==29136== by 0x4ADAF9A: __kmp_do_serial_initialize (kmp_runtime.cpp:6964)
==29136== by 0x4ADAF9A: __kmp_do_middle_initialize (kmp_runtime.cpp:7110)
==29136== by 0x4ADAF9A: __kmp_middle_initialize (kmp_runtime.cpp:7219)
==29136== by 0x4ABC60D: omp_get_num_procs@@VERSION (kmp_ftn_entry.h:615)
==29136== by 0x40113E: main (iomp_sched.c:5)
It looks like the source code for this function is available, like here
https://github.com/llvm-mirror/openmp/blob/master/runtime/src/z_Linux_util.cpp
It looks like it's making a deliberate invalid call to the sched_setaffinity syscall.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think that this has been resolved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please try the Intel compilers with the supported Intel oneAPI versions and let us know if you face a similar issue?
For more details regarding the supported version please refer to the below link
We don't support parallel studio versions which you are using currently but if you need you could be a licensed customer for oneAPI and get help from priority support.
Please refer to the below link for priority support
https://www.intel.com/content/www/us/en/developer/get-help/priority-support.html
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't get any errors with the 2023.1.0 base kit (on an old Xeon, Fedora 37).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As your issue is resolved with the latest version of Intel oneAPI, we are going ahead and closing this thread. This thread will no longer be monitored by Intel. If you need further assistance please post a new question.
Thanks & Regards
Shivani
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page