Re: Valgrind memcheck error in libiomp

paul_f · ‎05-30-2023

Not sure if this is the right forum.

I'm a spare time Valgrind developer, and have also encountered this issue in my day job.

When using Intel OpenMP (from pstudioxe2017) I get a memcheck error

==14500== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==14500==    at 0x21147E29: syscall (in /usr/lib64/libc-2.17.so)
==14500==    by 0x206EB028: __kmp_affinity_determine_capable (in /path/to/lib/Linux_x86_64/libiomp5.so)

This happens during a call to omp_get_num_procs.

Using gdb, I see that the assembler looks like this. This is using 'syscall()' rather than glibc 'sched_setaffinity()'

│    0x206eb019 <__kmp_affinity_determine_capable+73>        xor    %esi,%esi
│    0x206eb01b <__kmp_affinity_determine_capable+75>        mov    $0xcb,%edi
│    0x206eb020 <__kmp_affinity_determine_capable+80>        xor    %ecx,%ecx
│    0x206eb022 <__kmp_affinity_determine_capable+82>        xor    %eax,%eax
│    0x206eb024 <__kmp_affinity_determine_capable+84>        call   0x20651ae0 <syscall@plt>

The arguments, in order, are

%edi is 0xcb (203), the syscall number.

%esi is the PID, zero

%edx is the length of the mask in bytes, which I see is 640. Normally it's supposed to be sizeof(cpu_set_t) which is 128.

%rcx is the pointer to the mask, and $ecx is set to 0

The glibc manpage doesn't mention the use of a NULL mask pointer, so I can't tell if this is some undocumented use of sched_setaffinity that memcheck isn't handling or whether it is a bug in Intel OpenMP.

Looking at the kernel source, I think that the excess map length just gets ignored:

https://elixir.bootlin.com/linux/v4.4/source/kernel/sched/core.c#L4488

	else if (len > cpumask_size())
		len = cpumask_size();

I'll do some more debugging to see if the syscall is failing.

EDIT: the return is -1, so it looks like an Intel OpenMP bug to me.

ShivaniK_Intel · ‎05-31-2023

Hi,

Thanks for posting in the Intel forums.

Could you please try the supported version of the Intel oneAPI toolkit and let us know if you face a similar issue?

For more details regarding the supported version please refer to the below link

https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-parallel-studio-xe-supported-and-unsupported-product-versions.html

Could you please provide us with the below details?

1. OS

2. output of lscpu command

3. Sample reproducer and steps to reproduce the issue

Thanks & Regards

Shivani

paul_f · ‎05-31-2023

I can't easily try other versions. The most recent that we have installed is 2020. Asking for a more recent version to be installed is likely to take months. I can reproduce the problem with 2018 and 2020.4

OS - RHEL 7.9

lscpu - ntel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

GCC 11.2 built from source

(I don't think that any of the above change much)

Reproducer iomp_sched.c

#include <omp.h>

int main(void)
{
   (void)omp_get_num_procs();
}

Commands:

gcc -fopenmp -c -g iomp_sched.c

gcc -o iomp_sched iomp_sched.o -L/path/to/pstudioxe2018/lib/intel64 -liomp5 -Wl,-rpath,/path/to/pstudioxe2018/lib/intel64

valgrind ./iomp_sched

==29136== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==29136==    at 0x4EFFE29: syscall (in /usr/lib64/libc-2.17.so)
==29136==    by 0x4B19197: __kmp_affinity_determine_capable (z_Linux_util.cpp:185)
==29136==    by 0x4AF27E8: __kmp_env_initialize(char const*) (kmp_settings.cpp:5773)
==29136==    by 0x4ADAF9A: __kmp_do_serial_initialize (kmp_runtime.cpp:6964)
==29136==    by 0x4ADAF9A: __kmp_do_middle_initialize (kmp_runtime.cpp:7110)
==29136==    by 0x4ADAF9A: __kmp_middle_initialize (kmp_runtime.cpp:7219)
==29136==    by 0x4ABC60D: omp_get_num_procs@@VERSION (kmp_ftn_entry.h:615)
==29136==    by 0x40113E: main (iomp_sched.c:5)

It looks like the source code for this function is available, like here

https://github.com/llvm-mirror/openmp/blob/master/runtime/src/z_Linux_util.cpp

It looks like it's making a deliberate invalid call to the sched_setaffinity syscall.

paul_f · ‎06-07-2023

I don't think that this has been resolved.

ShivaniK_Intel · ‎06-08-2023

Hi,

Could you please try the Intel compilers with the supported Intel oneAPI versions and let us know if you face a similar issue?

For more details regarding the supported version please refer to the below link

https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-parallel-studio-xe-supported-and-unsupported-product-versions.html

We don't support parallel studio versions which you are using currently but if you need you could be a licensed customer for oneAPI and get help from priority support.

Please refer to the below link for priority support

https://www.intel.com/content/www/us/en/developer/get-help/priority-support.html

Thanks & Regards

Shivani

paul_f · ‎06-10-2023

I don't get any errors with the 2023.1.0 base kit (on an old Xeon, Fedora 37).

ShivaniK_Intel · ‎06-13-2023

Hi,

As your issue is resolved with the latest version of Intel oneAPI, we are going ahead and closing this thread. This thread will no longer be monitored by Intel. If you need further assistance please post a new question.

Thanks & Regards

Shivani

Valgrind memcheck error in libiomp

OpenMP

Runtime error