- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I have a segfault I would appreciate some help with. A nearly minimal code that reproduces it is attached.
The background is, that I am developing a code that handles big matrices, which should be distributed over CPUs along one index (labeled z in the example). I want to determine the distribution during run time, based on the number of prcosses as returned by an MPI routine. The way I have set it up it to have a module "global", that all other modules use, with some auxiliary variables related to the partitioning in it. In the main program I then obtain the number of processes and allocate these variables (in the example code only ny and nz, integers that appear in loop bounds, and kz, an allocatable array). Note that I have removed all MPI-related code from the example, setting nprocs and myrank by a simple assignment.
When I compile the attached code on our small cluster, running Linux version 2.6.18-164.11.1.el5 (Red Hat 4.1.2-46) and ifort version 11.1, I find that
* with optimization -O1 and -O2 the code runs and terminates cleanly;
* with optimization -O3 I get:
> ifort -O3 -traceback -o test.x DNS_int.f90
> ./test.x
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
test.x 0000000000403062 hit3d_mp_rhs3_ 44 hit3d.f90
test.x 0000000000402EC0 hit3d_mp_rhs_ 22 hit3d.f90
test.x 0000000000402C53 MAIN__ 33 DNS_int.f90
test.x 0000000000402ACC Unknown Unknown Unknown
libc.so.6 0000003A9B01D994 Unknown Unknown Unknown
test.x 00000000004029D9 Unknown Unknown Unknown
It would seem that the root cause is the way that kz is handled. If I declare it just like kx and ky, rather than dynamically, the segfault disappears. That would not be a solution, though, as I need to allocate it dynamically.
My questions:
1) Is the construction I use correct? If not, please suggest a correct way to do this (to allocate kz based on a value of nprocs determined during runtime).
2) If it is correct, then is this a compiler bug? Is there a work-around that keeps my code portable and the executable near-optimal?
Two more observations that may be relevant:
When I add compiler flags sometimes the segfault goes away. For instance, combining -O3 with any of the following: -check pointers, -check bounds, -check uninit, -no-vec makes the segfault disappear,
When I compile the code on my laptop, running Linux version 3.11.0-26-generic (Ubuntu 13.10) with ifort 12.1.0, there is no segfault at all at any optimization level.
Any help with this would be greatly appreciated.
Ссылка скопирована
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I haven't studied the code, but possibly this is a bug in the 11.1 compiler. The 15.0 compiler has no issue:
$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
$ ifort -O3 -traceback -o test.x DNS_int.f90
$ ./test.x
$
Patrick
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
It would be great if a specialist could confirm that this is an ifort bug. I just want to make sure it is not a mistake in my code.
Also, since I cannot change the version of ifort on the cluster, a safe work-around for version 11.1 would be very helpful.
Thanks for the check.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
What exact version of ifort 11.1 are you using (ie, the output of ifort -V)? I did try the last 11.1 version, and your example worked normally. I'd be happy to determine if this is really an ifort bug or not, but I need to be able to reproduce the SEGV.
Patrick
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Not that this matters with nprocs=1 but...
complex(kind=8), dimension(0:n/2,0:n-1,0:n-1) :: A,B,FA,FB
...
subroutine RHS(A,B,RA,RB)
complex(kind=8), intent(in), dimension(0:n/2,0:n-1,0:nz-1) :: A,B
complex(kind=8), intent(out), dimension(0:n/2,0:n-1,0:nz-1) :: RA,RB
Jim Dempsey
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I don't spot any coding errors. I think this is just an -O3 optimization bug in 11.1, since it works at -O2 with that version, or at -O3 with any other major compiler version I tested (11.1.080, 12.1.7.367, 13.1.3.192, 14.0.4.211, 15.0.0.090).
Patrick
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
The output of ifort -V:
Intel(R) Fortran Intel(R) 64 Compiler Professional for applications running on Intel(R) 64, Version 11.1 Build 20091130 Package ID: l_cprof_p_11.1.064
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.
And /proc/version reads:
Linux version 2.6.18-164.11.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Wed Jan 20 07:32:21 EST 2010
As for Jim Dempsey's comment: in the actual program these arrays do not occur in the main program, but I cut out several layers to narrow down the possible causes. The segfaults stays if I use n instead of nz to set the dimensions in subroutines RHS and RHS3. I suppose that means that kz is the root cause, not the array bounds. Thanks!
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
It's an -O3 unroll/jam defect in ifort-11.1.064. You can workaround it with -unroll0:
[U533981]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler Professional for applications running on Intel(R) 64, Version 11.1 Build 20091130 Package ID: l_cprof_p_11.1.064
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.
[U533981]$ ifort -O3 -traceback -o test.x DNS_int.f90 -unroll0
[U533981]$ ./test.x
[U533981]$
Patrick
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Thank you very much for sorting this out! I can move on with the project now, and I do not think the unrolling will impact significantly on the run time.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Thanks for the feedback, I'll consider this case closed then. I'll note in closing that -unroll0 only needs to be applied to hit3d.f90. You had included the file in DNS_int.f90. I commented out the include, and compiled hit3d separately to debug the issue. Of course, the unroll issue arises in the code generated for RA(kx_,ky_,kz_)=RA(kx_,ky_,kz_)-kz(kz_)*kx(kx_)*UU(kx_,ky_,kz_). As long as that statement is not a hotspot for your real application, the performance hit from applying -unroll0 probably won't be noticed.
Patrick

- Подписка на RSS-канал
- Отметить тему как новую
- Отметить тему как прочитанную
- Выполнить отслеживание данной Тема для текущего пользователя
- Закладка
- Подписаться
- Страница в формате печати