Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Tools (Compilers, Debuggers, Profilers & Analyzers)
- Intel® Fortran Compiler
- Overcoming Segmentaion fault error with efficient memory management on a cluster

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

Yaswanth

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-03-2020
03:09 PM

293 Views

Hello,

I am trying to solve a linear system of size (263169x263169) FORTRAN code using Intel compiler on a cluster. I do not get any error while allocating the memory to such a large one dimensional array (69257922561=263169x263169). I am running into segmentation fault runtime errors while executing the program. I have run the same code on a smaller system and it does not show any errors, so I think it is due to actual memory issues. I am not sure why I would run into this error when I do not get any warning while allocating the memory. I can run my code on up to 16 nodes on cluster each of which has a 256 GB RAM. So the combined RAM should be sufficient for me to solve the problem. I was suggested that distributing the memory onto different nodes might help resolve this error. I am not sure how one would do that.

To illustrate my problem more clearly, I have attached here a baby code in which I allocate the required memory for an array and a do loop in which I access each element of that array. As I mentioned, I do not get any compiling errors but I would end up with segmentation fault run time error.

```
program test
implicit none
integer :: AllocateStatus
integer :: n = 256, p
real(8), dimension(:,:), allocatable :: Cgm
real(8), dimension(:), allocatable :: Cg
real(16) :: ns
ns = (2.0_16*n+1.0_16)**2.0_16
print *, int(ns**2,8)
allocate (Cg(int(ns**2)), STAT = AllocateStatus)
if (AllocateStatus/=0) then
print *, "*** Not enough memory ***"
stop
end if
Cg = 0.0_8
do p = 1,int(ns**2)
!print *,p
Cg(p) = 1.0_8
end do
print *, Cg(p)
end program test
```

Thanks

Accepted Solutions

Highlighted

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2020
06:57 AM

156 Views

Note, using your 16 x 256GB cluster would have sufficient memory, but you cannot treat it as a single allocation by one system (rank) in an MPI process. Example

Each rank (one of 16) could manage 16,448 x 263,169 elements of the array (rectangular, eg rows or cols)

Or

Each rank could manage 65,792 x 65,792 square tiles of the array.

Or

some other partitioning that is suitable for your programming needs.

It has not been stated at to what your (model) problem needs to compute, if it is relatively simple, you might find COARRAYS the better choice.

Jim Dempsey

13 Replies

Highlighted

Alberto_R_Intel1

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-04-2020
11:28 AM

271 Views

Yaswanth, Thank you for posting in the Intel® Communities Support.

Just to let you know, I will move your thread to the proper department which is "Intel® Fortran Compiler" for them to further assist you with this topic.

Regards,

Albert R.

Intel Customer Support Technician

A Contingent Worker at Intel

Highlighted

Yaswanth

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-04-2020
11:36 AM

261 Views

Thank you, Alberto.

Highlighted

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
06:44 AM

227 Views

int(ns**2) returns an integer(4) value.

Your allocate statement would require int(n**2,8).

Also, there used to be an issue (may have been fixed), where any allocation that would exceed 2GB would work provided that the type of the allocation size variable(s) were expressed as integer(8). IOW their may have (at one time) been two different types of allocation descriptors, one for using 32-bit indexing, the other for 64-bit indexing (while the base may have been 64-bits, the limits of each dimension may have been 32 or 64 bits based on the descriptor type).

It would be a simple enough experiment for you to perform.

Also note:

integer :: p ! p is 32-bit integer(4)

...

do p=1,int(ns**2) ! iteration count is truncated as well

Jim Dempsey

Jim Dempsey

Highlighted

Steve_Lionel

Black Belt Retired Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
09:02 AM

219 Views

I assume this is on Linux. Linux does something called "lazy allocation" where, on allocate, the virtual address space is created but isn't actually added to your "working set" (my term, coming from VMS) until you touch it. (Some discussion here.)

The amount of RAM available is not relevant to whether or not the allocation succeeds or even the amount of virtual memory available - it affects mainly performance.

The compiler can't diagnose this, and if the ALLOCATE's call to the operating system's allocator succeeds, you won't get a failure from the ALLOCATE. But you can indeed get a segfault if you touch "too much" memory.

Jim's comment on the DO loop upper bound is valid, though I don't expect this would be the proximate cause of the segfault.

Highlighted

Yaswanth

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
02:34 PM

203 Views

Thank you, Jim.

The allocate command showed memory error when I wrote 'int(n**2,8)' instead of int(n**2). That was a mistake. Does this mean that it is not possible to create an array of size 263169x263169?

My aim is to solve a linear system of this size, in which the matrix is full. Could you please suggest some possible workarounds if it is indeed not possible to create such a big array. I tried to rewrite my code so that I do not create the entire array and instead access each element whenever required within the code. But this takes forever to just go over one iteration loop which is involved in solving the linear system.

Highlighted

Yaswanth

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
02:37 PM

201 Views

Thank you, Steve.

Could you also let me know if there are better ways to solve linear systems of large size in FORTRAN if it is not possible to create such an array in the first place.

Highlighted

Steve_Lionel

Black Belt Retired Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
03:55 PM

192 Views

If the allocate failed, then your system is not allowing you to create an array that big.

I can't help with the linear systems question, but I know there are others here who may be able to.

Highlighted

Arjen_Markus

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2020
11:31 PM

176 Views

I cannot claim to be an expert on solving linear systems of this size with a full matrix - or avoiding the memory issues that you describe (at least not without feeling thoroughly ashamed ;)). However, you say that you work on a cluster, so it should be possible to split up the NxN matrix into MxN or NxM matrices, where M is a suitable fraction of N and then apply partial Gaussian elimination or some more suitable and less basic iterative procedure on these parts. Communication between the programs that handle the parts could be done via coarrays or via MPI.

Of course whether that leads to an acceptable performance is another matter.

Highlighted

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2020
06:47 AM

158 Views

263169x263169 =

69,257,922,561 elements x 8 bytes/element =

554,063,380,488 number of bytes require (for this one allocation)

This is much larger than the capacity of the system you listed in your first post.

Furthermore, should you manipulate this array, using implicit looping array expressions, the statement may be encoded by the compiler to use a temporary (on stack!!!) of same size. IOW .gt. 1TB of RAM assuming you have configured for that much stack.

Additionally, most system default settings require a page file to exceed that of the sum of the processes running on the system.

While you could handle this using cluster style programming, or COARRAYS, your problem may potentially be solvable, old school style, via file I/O processing, or via partitioning the data and processing partition(s) by partition(s).

Jim Dempsey

Highlighted

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2020
06:57 AM

157 Views

Note, using your 16 x 256GB cluster would have sufficient memory, but you cannot treat it as a single allocation by one system (rank) in an MPI process. Example

Each rank (one of 16) could manage 16,448 x 263,169 elements of the array (rectangular, eg rows or cols)

Or

Each rank could manage 65,792 x 65,792 square tiles of the array.

Or

some other partitioning that is suitable for your programming needs.

It has not been stated at to what your (model) problem needs to compute, if it is relatively simple, you might find COARRAYS the better choice.

Jim Dempsey

Highlighted

Ulrich_M_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2020
02:46 PM

144 Views

It is simply not feasible to solve a dense linear system of this size in useful time, irrespective of memory constraints or programming language. An efficient solver takes about one minute to solve a nxn system with n=10000 on my system. The complexity of solving linear systems is theoretically at least of the order of n^2, and in practice it scales with n^3. Even with infinite memory it would hence take ~17,500 minutes or ~12 days to solve your system. And this holds only without any additional overhead from the huge memory requirement.

You will have to change the problem or make other approximations.

Highlighted
Thank you very much, Jim. I will try to modify my problem or make approximations to reduce the complexity. In either case, I will try to look at COARRAYS and partitioning the memory required.

Yaswanth

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-06-2020
04:12 PM

135 Views

Highlighted

John_Campbell

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-08-2020
09:47 PM

107 Views

Allocating an array of this size would possibly fail if it exceeds the virtual memory address size, but may not. In Windows, it could fail if it exceeds the paging and memory capacity. Typically allocations like these do not fail until the matrix coefficients are being defined.

Irrespective, considering a rank 2 matrix of size 263,169 as a dense matrix is probably not a practical approach.

You would need to solve this by utilising the sparse properties of the matrix. These sparse properties are usually associated with the field of research.

I work in the field of Finite Element Analysis, where the banded or profile of non-zero matrix values is utilised to significantly reduce the computation time by both eliminating storage of zero values and eliminating calculations using zero values to produce a solution in an acceptable time. My recent problems have extended to N=450,000, but require sparsity properties to be practically solved. Equation re-ordering is also a significant issue.

MKL does provide sparse matrix solvers, however the practicality of defining a large set of equations that is not singular could be a challenge.

You need to research how these types of large problems are practically managed to obtain a solution in a reasonable time using available computing resources.

Assembling a large set of linear equations usually implies a direct solver approach. Rather than using a direct solver, itterative solvers can be used, where the full matrix is never assembled. The most appropriate solution approach for your field of research probably has a long history of development.

For more complete information about compiler optimizations, see our Optimization Notice.