- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we are currently migrating an HPC application from HP-UX to Intel C/C++/Fortran Platform based on Linux RHEL4(x86_64).
The whole migration worked quite well in terms of compiling and running the code on a single CPU (Thread). When we start the application on more then one CPU (2-8Threads one thread per CPU) the program yields wrong results.
Whereas the same code runs with multiple CPUs on HP-UX without problems.
The program uses pthreads (Posix Threads and Libraries) for synchronisation and paralisation. The controlling and multithreading is written in C and the maths in fortran.
In our view it could be that we either use some libs not being threadsafe or the synchronization not working as expected. We have built some testcode to proof the synchronization and are quite sure this might not be the problem. So the other idea is if there are any libs used which are not threadsafe.
More infos see below.
Any ideas where and how we can proceed?
Did we miss something?
Regards and thanks in advance.
Information:
Application:
# ldd bin/x86_64/hpcprogram_linux-01
libifcoremt.so.5 => /opt/intel/fce/10.1.015/lib/libifcoremt.so.5 (0x0000002a95557000)
libifport.so.5 => /opt/intel/fce/10.1.015/lib/libifport.so.5 (0x0000002a9578b000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000325a700000)
libm.so.6 => /lib64/tls/libm.so.6 (0x00000039bae00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000321af00000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003219000000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003218500000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003d23e00000)
libimf.so => /opt/intel/cce/10.1.015/lib/libimf.so (0x0000002a958d3000)
libintlc.so.5 => /opt/intel/cce/10.1.015/lib/libintlc.so.5 (0x0000002a95c35000)
/lib64/ld-linux-x86-64.so.2 (0x0000003218300000)
Compiler options:
FOPT2 = -O0
CC = icc
CXX = icpc
CPP = icc
F90 = ifort
LINKER = icc
CFLAGS = -DAPM_HOST=CLYDE -Dmach_$(HOSTTYPE)
CXXFLAGS = -Dmach_$(HOSTTYPE)
CPPFLAGS = -E -Dmach_$(HOSTTYPE)
F90FLAGS = -cpp -module $(OBJ) -Dmach_$(HOSTTYPE) -assume nounderscore -threads -reentrancy threaded -fpic
LDFLAGS = /opt/intel/fce/10.1.015/lib/for_main.o -pthread -L/opt/intel/fce/10.1.015/lib -lifcoremt -lifport -m64 -lstdc++
System:
# rpm -qa | grep glibc
glibc-devel-2.3.4-2.36
glibc-2.3.4-2.36
compat-glibc-2.3.2-95.30
glibc-2.3.4-2.36
glibc-kernheaders-2.4-9.1.100.EL
glibc-devel-2.3.4-2.36
compat-glibc-headers-2.3.2-95.30
compat-glibc-2.3.2-95.30
glibc-common-2.3.4-2.36
glibc-headers-2.3.4-2.36
# uname -a
Linux node 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 12 17:58:20 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cpuinfo
processor : 0
vendor_id& nbsp; : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 2.66GHz
stepping : 8
cpu MHz : 2669.000
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est cid cx16 xtpr
bogomips : 5345.97
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:
.. 7 more CPUs to come ...
we are currently migrating an HPC application from HP-UX to Intel C/C++/Fortran Platform based on Linux RHEL4(x86_64).
The whole migration worked quite well in terms of compiling and running the code on a single CPU (Thread). When we start the application on more then one CPU (2-8Threads one thread per CPU) the program yields wrong results.
Whereas the same code runs with multiple CPUs on HP-UX without problems.
The program uses pthreads (Posix Threads and Libraries) for synchronisation and paralisation. The controlling and multithreading is written in C and the maths in fortran.
In our view it could be that we either use some libs not being threadsafe or the synchronization not working as expected. We have built some testcode to proof the synchronization and are quite sure this might not be the problem. So the other idea is if there are any libs used which are not threadsafe.
More infos see below.
Any ideas where and how we can proceed?
Did we miss something?
Regards and thanks in advance.
Information:
Application:
# ldd bin/x86_64/hpcprogram_linux-01
libifcoremt.so.5 => /opt/intel/fce/10.1.015/lib/libifcoremt.so.5 (0x0000002a95557000)
libifport.so.5 => /opt/intel/fce/10.1.015/lib/libifport.so.5 (0x0000002a9578b000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000325a700000)
libm.so.6 => /lib64/tls/libm.so.6 (0x00000039bae00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000321af00000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003219000000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003218500000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003d23e00000)
libimf.so => /opt/intel/cce/10.1.015/lib/libimf.so (0x0000002a958d3000)
libintlc.so.5 => /opt/intel/cce/10.1.015/lib/libintlc.so.5 (0x0000002a95c35000)
/lib64/ld-linux-x86-64.so.2 (0x0000003218300000)
Compiler options:
FOPT2 = -O0
CC = icc
CXX = icpc
CPP = icc
F90 = ifort
LINKER = icc
CFLAGS = -DAPM_HOST=CLYDE -Dmach_$(HOSTTYPE)
CXXFLAGS = -Dmach_$(HOSTTYPE)
CPPFLAGS = -E -Dmach_$(HOSTTYPE)
F90FLAGS = -cpp -module $(OBJ) -Dmach_$(HOSTTYPE) -assume nounderscore -threads -reentrancy threaded -fpic
LDFLAGS = /opt/intel/fce/10.1.015/lib/for_main.o -pthread -L/opt/intel/fce/10.1.015/lib -lifcoremt -lifport -m64 -lstdc++
System:
# rpm -qa | grep glibc
glibc-devel-2.3.4-2.36
glibc-2.3.4-2.36
compat-glibc-2.3.2-95.30
glibc-2.3.4-2.36
glibc-kernheaders-2.4-9.1.100.EL
glibc-devel-2.3.4-2.36
compat-glibc-headers-2.3.2-95.30
compat-glibc-2.3.2-95.30
glibc-common-2.3.4-2.36
glibc-headers-2.3.4-2.36
# uname -a
Linux node 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 12 17:58:20 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cpuinfo
processor : 0
vendor_id& nbsp; : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 2.66GHz
stepping : 8
cpu MHz : 2669.000
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est cid cx16 xtpr
bogomips : 5345.97
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:
.. 7 more CPUs to come ...
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't expect any problems with the libraries; all libraries supplied with these Intel compilers should be thread-safe. If you have C++ code, I would think LINKER should be icpc, but perhaps you don't require any C++ run-time.
Intel Thread Checker may be of some assistance, if all your threading is in icc. As far as I have been able to determine,it doesn't support checking source code where a language other than the one of main() performs threading, but you appear not to be in that situation.
Intel Thread Checker may be of some assistance, if all your threading is in icc. As far as I have been able to determine,it doesn't support checking source code where a language other than the one of main() performs threading, but you appear not to be in that situation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We somehow figured out the problem still the strangeness stays.
We had statements where matrices (w1, v1) were multiplied by each other and then multplied by a scalar (x1) like below
y1 = 1.d0 / x1 * matmul(w1,v1)
This code only jields errors in multithreaded environments. When we change it to
y1 = matmul(w1,v1)
y1 = 1.d0 / x1 * y1
Everything runs just fine. Is there any compiler option we're missing or did we hit a bug?
Thanks.
We had statements where matrices (w1, v1) were multiplied by each other and then multplied by a scalar (x1) like below
y1 = 1.d0 / x1 * matmul(w1,v1)
This code only jields errors in multithreaded environments. When we change it to
y1 = matmul(w1,v1)
y1 = 1.d0 / x1 * y1
Everything runs just fine. Is there any compiler option we're missing or did we hit a bug?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the intent was to force optimization, relying on the historic limitation of HPUX Fortran to left-to-right expression evaluation. If safety were a goal, it would have been written
y1 = matmul(w1,v1)/ x1
Otherwise, parentheses should have been used to specify the intended order of evaluation:
y1 = (1.d0 / x1) * matmul(w1,v1)
and then the option -assume protect_parens should be set (available since ifort 9.1).
It looks like you achieved the same result.
You haven't provided enough information to guess in what way this sloppiness affected threaded execution adversely.
y1 = matmul(w1,v1)/ x1
Otherwise, parentheses should have been used to specify the intended order of evaluation:
y1 = (1.d0 / x1) * matmul(w1,v1)
and then the option -assume protect_parens should be set (available since ifort 9.1).
It looks like you achieved the same result.
You haven't provided enough information to guess in what way this sloppiness affected threaded execution adversely.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page