- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following code segfaults, and I'm unable to identify why:
PROGRAM segfault_transpose IMPLICIT NONE INTEGER, PARAMETER :: runs = 2 INTEGER, PARAMETER :: matrix_size = 1024 INTEGER :: j REAL, DIMENSION(matrix_size, matrix_size) :: alpha DO j = 1, runs alpha = TRANSPOSE(alpha) END DO END PROGRAM segfault_transpose
My compile line is:
ifort -O3 -xHost -real-size 64 segfault_transpose.f90
The issue occurs for runs >= 2, and matrix_size >= 1024, along with 64 bit reals. Also, it only happens if I feed the result of the transpose to the matrix itself.
I am using ifort version 18.0.3
The segfault message is as follows:
forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source a.out 000000000040473D Unknown Unknown Unknown libpthread-2.28.s 00007F77A49893C0 Unknown Unknown Unknown a.out 000000000040380C Unknown Unknown Unknown a.out 00000000004037DE Unknown Unknown Unknown libc-2.28.so 00007F77A47D7223 __libc_start_main Unknown Unknown a.out 00000000004036EE Unknown Unknown Unknown
a back-trace from gdb does not help; it says the issue is at line 1 !? :
#0 0x0000000000403803 in segfault_transpose () at segfault_transpose.f90:1 #1 0x00000000004037de in main () #2 0x00007ffff7c56223 in __libc_start_main () from /usr/lib/libc.so.6 #3 0x00000000004036ee in _start ()
valgrind's output has the following; but I don't know what it means:
==12271== Invalid write of size 8 ==12271== at 0x403803: MAIN__ (segfault_transpose.f90:1) ==12271== Address 0x1ffe7ff348 is on thread 1's stack
Can someone help me? I'm at a loss as to what exactly is happening.
p.s. gfortran seems to be fine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is that, since you have alpha on both sides of the assignment, the language requires that the TRANSPOSE be completely evaluated before any assignment is done; thus requiring a temp. If you have different variables then the compiler produces a nice, vectorized sequence without a stack (or other) temp. Of course, you now have your own temp...
One thing you can do is use allocatables and MOVE_ALLOC to prevent an extra copy, like so:
PROGRAM segfault_transpose IMPLICIT NONE INTEGER, PARAMETER :: runs = 2 INTEGER, PARAMETER :: matrix_size = 1024 INTEGER :: j REAL, ALLOCATABLE, DIMENSION(:,:) :: alpha, beta ALLOCATE (alpha(matrix_size,matrix_size), beta(matrix_size,matrix_size)) call random_number(alpha) DO j = 1, runs beta = TRANSPOSE(alpha) END DO CALL MOVE_ALLOC (FROM=beta,TO=ALPHA) ! deallocates alpha, moves allocation from beta to alpha ! marks beta as deallocated PRINT *, alpha(1:10,1:2) END PROGRAM segfault_transpose
(I added code to prevent the compiler from optimizing the whole thing away.)
When I was first playing with this, I thought that the compiler was optimizing away the deallocation of alpha in the MOVE_ALLOC. What I hadn't noticed at first is that it moved that code out of the main code path and jumped to it only if needed, then jumped back, thus improving instruction cache behavior (if alpha didn't need deallocating).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I should've searched the forum before posting; looks like I just have to move the automatic arrays to the heap, as described here:
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/401108
But then this leads me to another question: Do I always shunt the arrays off to heap, or do I instead increase the size of the stack? Or something in-between by providing an argument to -heap-arrays[] ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If forum search works properly, you should find Steve Lionel's advice about not using the threshold option for heap-arrays. It may apply only when the allocation size is known at compile time, leaving variable size allocations on stack.
The simple solution is to use heap always until you have the opportunity to find out whether shifting back to stack is the best way to solve a performance issue.
Beginners tend to go overboard with changes in stack size, and it may take some effort to find the best value. linux defaults tend to be more usable than Windows ones.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay; I fixed it by making large arrays ALLOCATABLE, so that they go to the heap.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But note that TRANSPOSE will likely create a temporary copy, which goes on the stack (unless -heap-arrays is specified.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steve I am curious to know why the compiler needs to create a temporary copy? Is there a way to avoid it? I suppose then that would also be slower than performing the transpose by do-loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is that, since you have alpha on both sides of the assignment, the language requires that the TRANSPOSE be completely evaluated before any assignment is done; thus requiring a temp. If you have different variables then the compiler produces a nice, vectorized sequence without a stack (or other) temp. Of course, you now have your own temp...
One thing you can do is use allocatables and MOVE_ALLOC to prevent an extra copy, like so:
PROGRAM segfault_transpose IMPLICIT NONE INTEGER, PARAMETER :: runs = 2 INTEGER, PARAMETER :: matrix_size = 1024 INTEGER :: j REAL, ALLOCATABLE, DIMENSION(:,:) :: alpha, beta ALLOCATE (alpha(matrix_size,matrix_size), beta(matrix_size,matrix_size)) call random_number(alpha) DO j = 1, runs beta = TRANSPOSE(alpha) END DO CALL MOVE_ALLOC (FROM=beta,TO=ALPHA) ! deallocates alpha, moves allocation from beta to alpha ! marks beta as deallocated PRINT *, alpha(1:10,1:2) END PROGRAM segfault_transpose
(I added code to prevent the compiler from optimizing the whole thing away.)
When I was first playing with this, I thought that the compiler was optimizing away the deallocation of alpha in the MOVE_ALLOC. What I hadn't noticed at first is that it moved that code out of the main code path and jumped to it only if needed, then jumped back, thus improving instruction cache behavior (if alpha didn't need deallocating).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure - that works too. I don't know your application and thought you might be concerned with an extra copy sitting in memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to avoid the extra memory allocation if possible, because RAM is a little valuable, especially when I scale up my problem to large system sizes.
But I don't think I can, because all of my 'work' is inside that DO loop. In each iteration, I get a new `alpha`, and do stuff like TRANSPOSEing it. And if I have to allocate space for a temporary array, I might as well keep it.
Unless... I do the alloc-deallocs inside the loop:
DO j = 1, runs ALLOCATE(alpha(size,size)) CALL RANDOM_NUMBER(alpha) ALLOCATE(beta(size,size)) beta = TRANSPOSE(alpha) DEALLOCATE(alpha) ! use beta for stuff DEALLOCATE(beta) END DO
But there is still a small window where both of them are allocated, and that will be the 'memory-limiting' region. So then I should just have both `alpha` and `beta` allocated outside the loop to avoid the overhead from the allocations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree - allocate them outside the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vishnu,
Your simplified code illustrates that alpha isn't used after transpose. Could you perhaps simply swap the indexing order
alpha(I,J) to alpha(J,I)
Note, the original alpha could be produced with the indexes the other way around too.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vishnu wrote:The above is oversimplified. In my actual code, I do use it after, including in a MATMUL, and an SYEVR. I don't access it by index.
@Vishnu.
Can you show a minimal working example of matrix calculations only (TRANSPOSE, MATMUL, [LAPACK?)]SYEVR, etc.) of your actual code that works up to a certain problem size and then runs into segmentation fault? Note you can exclude all of your domain-specific (or proprietary) details and just focus on matrix stuff. That can help other readers make suggestions too; otherwise it ends up wasting other readers' time in making the effort to offer you input only to read you find it not useful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page