- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
EDIT: I have tested the reproducer in a Linux box and it also throws a segmentation fault, I have taken this issue to Premier Support. Could somebody confirm the error, or enlighten me with what I am doing wrong?
Hi, I am running into a strange problem, maybe its just some silly option I must turn on.
I have derived types with allocatable components. If I try to allocate a large (but still very reasonable) amount of memory, I get a segmentation fault, or even a crash without warning. If I allocate the same amount of memory for a normal array I have no problems whatsoever. The critical value seems to be around 2^19 complex(8) elements, that is smaller than a 1000x1000 double precision complex matrix.
Here is a reproducer, perhaps its bloated because I didnt try hard to reduce it to bones, but it should do. I also do not have other machines to reproduce it on: the problem shows up on a 2009 MacBook Pro with 4GB of RAM, and the latest version of Composer XE.
Here is the output from the console:
[bash]$ ifort driver.f90
$ ./a.out
523818
Success with 523818
Just a number to confuse the optimizer
(-3.00000000000000,0.000000000000000E+000)
Normal allocation is OK with 524288
Segmentation fault
[/bash] And here is the code
[fortran]module Tensor_Class
implicit none
complex(8),parameter :: II=(0.0d0,1.0d0)
type,private :: Tensor
private
integer :: Initialized=.false.
contains
procedure,public :: getDimensions => getDimensions_Of_Tensor
end type Tensor
type,public,extends(Tensor) :: Vector
private
complex(8),allocatable :: data(:)
contains
end type Vector
!###############################
!##### Operators and methods
!###############################
interface new_Tensor
module procedure new_Vector
end interface
interface assignment (=)
module procedure new_Tensor_fromAssignment
end interface
contains
!######################################################################################
!##### Creation operators
!######################################################################################
function new_Vector (dim1) result (this)
integer,intent(in) :: dim1
type(Vector) :: this
real(8) :: randomnumbersR(dim1) ,randomnumbersI(dim1)
if(this%Initialized) deallocate(this%data)
allocate(this%data(dim1))
Call random_number(randomnumbersR)
call random_number(randomnumbersI)
This%data=randomnumbersR+II*randomnumbersI
this%Initialized=.true.
end function new_Vector
!##################################################################
subroutine new_Tensor_fromAssignment(lhs,rhs)
class(Vector),intent(out) :: lhs
type(Vector),intent(in) :: rhs
if(lhs%Initialized) deallocate(lhs%data)
allocate(lhs%data(size(rhs%data,1)))
lhs%data=rhs%data
lhs%Initialized=.true.
end subroutine new_Tensor_fromAssignment
!##################################################################
!################# Polymorphic function ################
!##################################################################
function getDimensions_Of_Tensor(this) result(Dims)
class(Tensor),intent(IN) :: this
! type(Tensor1) :: Dims
integer,allocatable :: Dims(:)
select type (Typed_this => this)
class is (Vector)
allocate(Dims(1))
Dims=shape(Typed_this%data)
class is (Tensor)
print *,'Dimensions not defined'
return
end select
end function getDimensions_Of_Tensor
end module Tensor_Class
module Tensor_Class_fun
use Tensor_Class
implicit none
contains
subroutine type_creation(dims)
type(Vector) :: aVector
integer :: dims,resultdims(1)
CALL random_seed()
aVector=new_Tensor(dims)
resultdims=aVector%GetDimensions()
print *,resultdims
end subroutine type_creation
end module Tensor_Class_fun
program main
use Tensor_Class_fun
implicit none
integer :: dims
complex(8),allocatable:: localVector(:)
dims=2**19-470
call type_creation(dims)
Print *,'Success with ',dims
dims=2**19
allocate(localVector(dims))
localVector=1.0d0
localVector(1000)=localVector(dims)-5.0d0*localVector(1000)
print *,'Just a number to confuse the optimizer',localVector(1000)+localvector(2999)
Print *,'Normal allocation is OK with ',dims
deallocate(localVector) !Free mem to keep it clean
call type_creation(dims)
Print *,'Success?'
end program main
[/fortran] Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got super fast-and-friendly response from the Premier Support team (thanks Roland), who found that the problem seems to be in the size of the stack.
In Linux, you get rid of this problem by issuing 'ulimit -s unlimited', but on the Mac there is a hard 64MB limit in the kernel that you cannot overcome. The only workaround is to compile using the -heap-arrays option, which makes the compiler use the heap instead of the stack for storing temporary arrays. With this option, the problem goes away (seehttp://software.intel.com/en-us/articles/intel-fortran-compiler-increased-stack-usage-of-80-or-higher-compilers-causes-segmentation-fault/ )
I asked the Support team, but I also wanted to ask here: Is there too high a performance penalty for using the heap instead of the stack? Or, more important for my project, is it bad that I am using too much stack space?
Reinspecting my code, I see that I create the new object through an assignment, which probably creates a temporary (thus using the stack). Am I right? The main reason to do this was code cleanliness, but I am totally willing to get rig of it for performance reasons...so my questions are:
If I created new objects via a subroutine, would I get rid of the stack usage? And if so, would that make the program faster? I know I could just do it and test, but it is some Klines of code that I have to change, so I would appreciate a lot advice from somebody more experienced on this.
On the other hand, if using GBs of stack is ok under Linux, I can live with the code as is, since the production program will end up in a Linux cluster, while I write and test on the Mac using -heap-arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting fercook
Is there too high a performance penalty for using the heap instead of the stack? Or, more important for my project, is it bad that I am using too much stack space?
Reinspecting my code, I see that I create the new object through an assignment, which probably creates a temporary (thus using the stack). Am I right? The main reason to do this was code cleanliness, but I am totally willing to get rig of it for performance reasons...so my questions are:
If I created new objects via a subroutine, would I get rid of the stack usage? And if so, would that make the program faster? I know I could just do it and test, but it is some Klines of code that I have to change, so I would appreciate a lot advice from somebody more experienced on this.
On the other hand, if using GBs of stack is ok under Linux, I can live with the code as is, since the production program will end up in a Linux cluster, while I write and test on the Mac using -heap-arrays.
Heap allocation is expected to cost more time than stack, but should not have significant cost if done outside inner loops. You want to avoid forcing users of your program unnecessarily to deal with stack settings.
Threading compatible options, such as RECURSIVE procedure declaration, or compile options with equivalent effect (-openmp, -auto, ....) switch implied allocation of local arrays upon procedure entry from heap to stack. This is required for thread safety. So you can use subroutine entry creation of objects to imply heap only if the subroutine will not undergo threading, thus there may be bigger performance implications. To use, cluster computing leads to considering threading as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
On Apple OSX, you can increase the stack size of an executable at compile-time using the -stack_size argument to the linker. For example:
ifort-Wl,-stack_size -Wl,0x10000000 driver.f90
would set the stack size to 1GB. If you are compiling a 64-bit executable, you can go much larger:
ifort-Wl,-stack_size -Wl,0x40000000 driver.f90
would set the stack size to 4 GB. No need to use slower heap arrays,
Martin Otte
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page