Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
48 Views

Segmentation fault due to ZCOPY mkl

Dear All,

I am compiling a Fortran code with the ifort compiler version 2017.4 and I get a strange segmentation fault message. Now, this code works fine on an older machine with compiler version 2017.1, and running Ubuntu 16.04. The code is pretty large I cannot post it, but it's available on GitHub (it's ttpy on GitHub). The computer I am using runs CentOS 7.4, and I tried all sort of compilers (2017.1, 2017.4 2017.6 and 2018.1, all give a segfault message). 

I am using the lp64 mkl interface.

I know it is hard to tell from here but I'd really appreciate any help. Below is the message I get.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
test_ksl_cme       00000000004E2744  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B90AA2925E0  Unknown               Unknown  Unknown
libiomp5.so        00002B90A9C51B57  omp_in_parallel       Unknown  Unknown
libmkl_intel_thre  00002B90A679588B  mkl_serv_domain_g     Unknown  Unknown
libmkl_intel_thre  00002B90A67A8C49  mkl_blas_zcopy        Unknown  Unknown
libmkl_intel_lp64  00002B90A5D0C968  ZCOPY                 Unknown  Unknown
test_ksl_cme       0000000000435B75  Unknown               Unknown  Unknown
test_ksl_cme       00000000004C267C  Unknown               Unknown  Unknown
test_ksl_cme       0000000000411E6E  Unknown               Unknown  Unknown
test_ksl_cme       00000000004058C2  Unknown               Unknown  Unknown
test_ksl_cme       0000000000403F9E  Unknown               Unknown  Unknown
libc-2.17.so       00002B90AA4C0C05  __libc_start_main     Unknown  Unknown
test_ksl_cme       0000000000403EA9  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
test_ksl_cme       00000000004E2A71  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B90AA2925E0  Unknown               Unknown  Unknown
libiomp5.so        00002B90A9C66DA4  Unknown               Unknown  Unknown
ld-2.17.so         00002B90A599AB58  Unknown               Unknown  Unknown
libc-2.17.so       00002B90AA4D7A69  Unknown               Unknown  Unknown
libc-2.17.so       00002B90AA4D7AB5  Unknown               Unknown  Unknown
test_ksl_cme       00000000004DEAD9  Unknown               Unknown  Unknown
test_ksl_cme       00000000004E2744  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B90AA2925E0  Unknown               Unknown  Unknown
libiomp5.so        00002B90A9C51B57  omp_in_parallel       Unknown  Unknown
libmkl_intel_thre  00002B90A679588B  mkl_serv_domain_g     Unknown  Unknown
libmkl_intel_thre  00002B90A67A8C49  mkl_blas_zcopy        Unknown  Unknown
libmkl_intel_lp64  00002B90A5D0C968  ZCOPY                 Unknown  Unknown
test_ksl_cme       0000000000435B75  Unknown               Unknown  Unknown
test_ksl_cme       00000000004C267C  Unknown               Unknown  Unknown
test_ksl_cme       0000000000411E6E  Unknown               Unknown  Unknown
test_ksl_cme       00000000004058C2  Unknown               Unknown  Unknown
test_ksl_cme       0000000000403F9E  Unknown               Unknown  Unknown
libc-2.17.so       00002B90AA4C0C05  __libc_start_main     Unknown  Unknown
test_ksl_cme       0000000000403EA9  Unknown               Unknown  Unknown

 

 

Thanks

0 Kudos
3 Replies
Highlighted
Beginner
48 Views

Hi,

thanks for the reply, actually the zcopy seems to fail randomly and in different parts of the code. I have checked the stack size limits on my machine and they are fine (unlimited). I am trying to reproduce the problem but without success. 

Concerning the necessity of using zcopy I think the main reason behind that is the possibility to exploit multithread architecture, anpart from that I don't know why the author of code decided to use the zcopy routine. I'll try to remove the offending zcopy routines and see what happens.

Raffaele

 

0 Kudos
Highlighted
Beginner
48 Views

By the way,

I just wanted to add that your assignment is reversed to what zcopy does. It should be 

crU(1:nc) = zresult_core(1:nc)

 

0 Kudos
Highlighted
Black Belt
48 Views

It is unlikely that someone will download, build and run a large package (involving Fortran+Python+?) just to check a call to a BLAS routine. My suggestion is that, because the file tt-fort.f90 at https://github.com/oseledets/tt-fort/blob/65a62e3a4d7b10ffd00e55628ba1216d1dae3fd9/test_ksl_cme.f90 contains just one call to ZCOPY, i.e.,

  call zcopy(sum(ru(1:d)*n(1:d)*ru(2:d+1)), zresult_core, 1, crU, 1)
you may check the value of the first argument by printing it out and then comparing the output value (which is the length of the array being copied) to the sizes of zresult_core and crU. 
Note that in modern Fortran there is no need to call a subroutine to copy a vector. You could simply write:
integer nc
   ...
   nc = sum(ru(1:d)*n(1:d)*ru(2:d+1)
   print *,'nc = ',nc, ubound(zresult_core),ubound(crU)
   crU(1:nc) = zresult_core(1:nc)          ! I had earlier, incorrectly, zresult_core(1:nc) = crU(1:nc)
 

 

0 Kudos