- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
EDIT: I have tested the reproducer in a Linux box and it also throws a segmentation fault, I have taken this issue to Premier Support. Could somebody confirm the error, or enlighten me with what I am doing wrong?
Hi, I am running into a strange problem, maybe its just some silly option I must turn on.
I have derived types with allocatable components. If I try to allocate a large (but still very reasonable) amount of memory, I get a segmentation fault, or even a crash without warning. If I allocate the same amount of memory for a normal array I have no problems whatsoever. The critical value seems to be around 2^19 complex(8) elements, that is smaller than a 1000x1000 double precision complex matrix.
Here is a reproducer, perhaps its bloated because I didnt try hard to reduce it to bones, but it should do. I also do not have other machines to reproduce it on: the problem shows up on a 2009 MacBook Pro with 4GB of RAM, and the latest version of Composer XE.
Here is the output from the console:
[bash]$ ifort driver.f90 $ ./a.out 523818 Success with 523818 Just a number to confuse the optimizer (-3.00000000000000,0.000000000000000E+000) Normal allocation is OK with 524288 Segmentation fault [/bash]
And here is the code
[fortran]module Tensor_Class implicit none complex(8),parameter :: II=(0.0d0,1.0d0) type,private :: Tensor private integer :: Initialized=.false. contains procedure,public :: getDimensions => getDimensions_Of_Tensor end type Tensor type,public,extends(Tensor) :: Vector private complex(8),allocatable :: data(:) contains end type Vector !############################### !##### Operators and methods !############################### interface new_Tensor module procedure new_Vector end interface interface assignment (=) module procedure new_Tensor_fromAssignment end interface contains !###################################################################################### !##### Creation operators !###################################################################################### function new_Vector (dim1) result (this) integer,intent(in) :: dim1 type(Vector) :: this real(8) :: randomnumbersR(dim1) ,randomnumbersI(dim1) if(this%Initialized) deallocate(this%data) allocate(this%data(dim1)) Call random_number(randomnumbersR) call random_number(randomnumbersI) This%data=randomnumbersR+II*randomnumbersI this%Initialized=.true. end function new_Vector !################################################################## subroutine new_Tensor_fromAssignment(lhs,rhs) class(Vector),intent(out) :: lhs type(Vector),intent(in) :: rhs if(lhs%Initialized) deallocate(lhs%data) allocate(lhs%data(size(rhs%data,1))) lhs%data=rhs%data lhs%Initialized=.true. end subroutine new_Tensor_fromAssignment !################################################################## !################# Polymorphic function ################ !################################################################## function getDimensions_Of_Tensor(this) result(Dims) class(Tensor),intent(IN) :: this ! type(Tensor1) :: Dims integer,allocatable :: Dims(:) select type (Typed_this => this) class is (Vector) allocate(Dims(1)) Dims=shape(Typed_this%data) class is (Tensor) print *,'Dimensions not defined' return end select end function getDimensions_Of_Tensor end module Tensor_Class module Tensor_Class_fun use Tensor_Class implicit none contains subroutine type_creation(dims) type(Vector) :: aVector integer :: dims,resultdims(1) CALL random_seed() aVector=new_Tensor(dims) resultdims=aVector%GetDimensions() print *,resultdims end subroutine type_creation end module Tensor_Class_fun program main use Tensor_Class_fun implicit none integer :: dims complex(8),allocatable:: localVector(:) dims=2**19-470 call type_creation(dims) Print *,'Success with ',dims dims=2**19 allocate(localVector(dims)) localVector=1.0d0 localVector(1000)=localVector(dims)-5.0d0*localVector(1000) print *,'Just a number to confuse the optimizer',localVector(1000)+localvector(2999) Print *,'Normal allocation is OK with ',dims deallocate(localVector) !Free mem to keep it clean call type_creation(dims) Print *,'Success?' end program main [/fortran]
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got super fast-and-friendly response from the Premier Support team (thanks Roland), who found that the problem seems to be in the size of the stack.
In Linux, you get rid of this problem by issuing 'ulimit -s unlimited', but on the Mac there is a hard 64MB limit in the kernel that you cannot overcome. The only workaround is to compile using the -heap-arrays option, which makes the compiler use the heap instead of the stack for storing temporary arrays. With this option, the problem goes away (seehttp://software.intel.com/en-us/articles/intel-fortran-compiler-increased-stack-usage-of-80-or-higher-compilers-causes-segmentation-fault/ )
I asked the Support team, but I also wanted to ask here: Is there too high a performance penalty for using the heap instead of the stack? Or, more important for my project, is it bad that I am using too much stack space?
Reinspecting my code, I see that I create the new object through an assignment, which probably creates a temporary (thus using the stack). Am I right? The main reason to do this was code cleanliness, but I am totally willing to get rig of it for performance reasons...so my questions are:
If I created new objects via a subroutine, would I get rid of the stack usage? And if so, would that make the program faster? I know I could just do it and test, but it is some Klines of code that I have to change, so I would appreciate a lot advice from somebody more experienced on this.
On the other hand, if using GBs of stack is ok under Linux, I can live with the code as is, since the production program will end up in a Linux cluster, while I write and test on the Mac using -heap-arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting fercook
Is there too high a performance penalty for using the heap instead of the stack? Or, more important for my project, is it bad that I am using too much stack space?
Reinspecting my code, I see that I create the new object through an assignment, which probably creates a temporary (thus using the stack). Am I right? The main reason to do this was code cleanliness, but I am totally willing to get rig of it for performance reasons...so my questions are:
If I created new objects via a subroutine, would I get rid of the stack usage? And if so, would that make the program faster? I know I could just do it and test, but it is some Klines of code that I have to change, so I would appreciate a lot advice from somebody more experienced on this.
On the other hand, if using GBs of stack is ok under Linux, I can live with the code as is, since the production program will end up in a Linux cluster, while I write and test on the Mac using -heap-arrays.
Heap allocation is expected to cost more time than stack, but should not have significant cost if done outside inner loops. You want to avoid forcing users of your program unnecessarily to deal with stack settings.
Threading compatible options, such as RECURSIVE procedure declaration, or compile options with equivalent effect (-openmp, -auto, ....) switch implied allocation of local arrays upon procedure entry from heap to stack. This is required for thread safety. So you can use subroutine entry creation of objects to imply heap only if the subroutine will not undergo threading, thus there may be bigger performance implications. To use, cluster computing leads to considering threading as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
On Apple OSX, you can increase the stack size of an executable at compile-time using the -stack_size argument to the linker. For example:
ifort-Wl,-stack_size -Wl,0x10000000 driver.f90
would set the stack size to 1GB. If you are compiling a 64-bit executable, you can go much larger:
ifort-Wl,-stack_size -Wl,0x40000000 driver.f90
would set the stack size to 4 GB. No need to use slower heap arrays,
Martin Otte
![](/skins/images/7B0AB6865064EAE32B30EC9A4E94B48A/responsive_peak/images/icon_anonymous_message.png)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page