- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the following simple code I have SIGSEV on "w=u" line. It does not happen for small arrays, but starts from arrays of size 2**20 elements and larger.
----------------
program test_eq
implicit none
integer,parameter :: m=1024*1024,n=100
integer :: i,j
double precision,dimension(:,:),pointer :: u,w,mat
allocate(u(m,n),w(m,n),mat(n,n))
write(*,fmt='(a)')'fill in U '
forall(i=1:m,j=1:n) u(i,j)=1.d0/(dlog(dble(i))+j)
write(*,fmt='(a)')'fill in U done'
w=u
write(*,*)'w=u ok'
mat=matmul(transpose(u),w)
write(*,*)mat(1,1)
end program
-----------------
$ ifort --version
ifort (IFORT) 11.1 20090827
$ uname -a
Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - drraug
In the following simple code I have SIGSEV on "w=u" line. It does not happen for small arrays, but starts from arrays of size 2**20 elements and larger.
try ulimit -s unlimited
that should help
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - drraug
In the following simple code I have SIGSEV on "w=u" line. It does not happen for small arrays, but starts from arrays of size 2**20 elements and larger.
try ulimit -s unlimited
that should help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Ronald W. Green (Intel)
And take a look at -heap-arrays compiler option, as long as you are not also using -openmp (don't use these 2 together).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - drraug
Thank you, Ronald, but I really need openmp a lot, that's a point.
I do agree it's desirable to be able to use transpose() and matmul() in the form shown, and the number of automatic temporaries should be minimized. For example, the compiler should recognize that the assigned array mat() is available to assemble the result, rather than building it in a temporary and copying it, if that's still the way it's done. I believe that certain cases of transpose() as an argument to matmul() are recognized for optimization without another temporary. In view of the demonstration that even the simple w=u seemed to create a temporary when using allocatable, it seems unwise to count on it. If an OpenMP parallel equivalent of matmul is required, or one of a large enough size to incur danger of stack overflow, MKL library is superior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear tim18!
Thank you very much for your comment. It seems that you a right person to address my further questions to. Maybe this even can be done in this thread, however, because most of them rely to your comment.
(1) Does Intel plan to support OpenMP parallelization of FORALL, MAXLOC, MATMUL in nearest future? For now I see no parallelization and no speedup for this parts of my code.
(2) What alternatives could we use for parallelization of these operations on shared memory?
(2a) For MATMUL the obvious alternative are BLAS functions implemented in mkl. (BTW, does current version of MKL use parts of GotoBLAS code?), So, there is actually no problem with that.
(2b) A very strange alternative of FORALL can be OMP DO / END DO block, that seems to be supported in current version of Intel Fortran. But is it really a good idea?
(2c) It is still unclear for me, how can we boost operations like MAXLOC(abs(A)). MAXLOC does not give any speedup on multi-core system, when used with OMP WORKSHARE directives. The BLAS alternative IDAMAX function also does not seem to use many threads/cores. Also, DGER or similar FORALL construct for rank-one matrix update still are not parallized by Intel+OMP. Can you advise smth here?
Quoting - tim18
Thank you very much for your comment. It seems that you a right person to address my further questions to. Maybe this even can be done in this thread, however, because most of them rely to your comment.
(1) Does Intel plan to support OpenMP parallelization of FORALL, MAXLOC, MATMUL in nearest future? For now I see no parallelization and no speedup for this parts of my code.
(2) What alternatives could we use for parallelization of these operations on shared memory?
(2a) For MATMUL the obvious alternative are BLAS functions implemented in mkl. (BTW, does current version of MKL use parts of GotoBLAS code?), So, there is actually no problem with that.
(2b) A very strange alternative of FORALL can be OMP DO / END DO block, that seems to be supported in current version of Intel Fortran. But is it really a good idea?
(2c) It is still unclear for me, how can we boost operations like MAXLOC(abs(A)). MAXLOC does not give any speedup on multi-core system, when used with OMP WORKSHARE directives. The BLAS alternative IDAMAX function also does not seem to use many threads/cores. Also, DGER or similar FORALL construct for rank-one matrix update still are not parallized by Intel+OMP. Can you advise smth here?
Quoting - tim18
In case it may be relevant, the forall in the example is likely to create a temporary array and be less efficient than f77 style. The current compilers attempt to vectorize a single extent single assignment with forall, if preceded by IVDEP directive. Currently, ifort has no OpenMP parallelization of forall().
....
If an OpenMP parallel equivalent of matmul is required, or one of a large enough size to incur danger of stack overflow, MKL library is superior.
....
If an OpenMP parallel equivalent of matmul is required, or one of a large enough size to incur danger of stack overflow, MKL library is superior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This seems to have drifted far from the original question.
OpenMP 2.5 support seems not to be a high priority, and FORALL and MAXLOC aren't ideally suited for parallel.
In my examples at http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors you will see some of the OpenMP alternatives, including the rank 2 maxloc implemented with omp critical.
f2008 DOACROSS has been advocated as superior to FORALL, but some initial proposals include translating it to FORALL, so nothing gained, particularly as it's not often considered an important innovation.
The case where I see FORALL as a superior syntax to DO..ENDDO is where a MASK is in use, but it still depends on the compiler to recognize where multiple assignments may be fused into a single loop, a situation where the intent is clear with DO...ENDDO. Have you studied the details of what is required by FORALL, and you don't think those somewhat strange?
The most often advocated solution for matmul, for matrices large enough to benefit from OpenMP within the matmul operation, the substitution of a BLAS call behind the scene (e.g. MKL library) hasn't achieved much favor. In several of the situations where it can be done with gfortran, for example, the problem of extra temporary arrays hasn't been solved.
Vectorizable rank one operations generally have to be quite large (size several thousand) to benefit from threading. Some of the MKL BLAS rank one operations may include detection now of cases large enough for threading.
OpenMP 2.5 support seems not to be a high priority, and FORALL and MAXLOC aren't ideally suited for parallel.
In my examples at http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors you will see some of the OpenMP alternatives, including the rank 2 maxloc implemented with omp critical.
f2008 DOACROSS has been advocated as superior to FORALL, but some initial proposals include translating it to FORALL, so nothing gained, particularly as it's not often considered an important innovation.
The case where I see FORALL as a superior syntax to DO..ENDDO is where a MASK is in use, but it still depends on the compiler to recognize where multiple assignments may be fused into a single loop, a situation where the intent is clear with DO...ENDDO. Have you studied the details of what is required by FORALL, and you don't think those somewhat strange?
The most often advocated solution for matmul, for matrices large enough to benefit from OpenMP within the matmul operation, the substitution of a BLAS call behind the scene (e.g. MKL library) hasn't achieved much favor. In several of the situations where it can be done with gfortran, for example, the problem of extra temporary arrays hasn't been solved.
Vectorizable rank one operations generally have to be quite large (size several thousand) to benefit from threading. Some of the MKL BLAS rank one operations may include detection now of cases large enough for threading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear tim18!
Thank you very much for you kind reply!
Thank you very much for you kind reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The f2008 version of DOACROSS is spelled do concurrent. It's generally superior in ifort performance to forall, but not as good as f77 DO, for multiple assignment constructs where all are applicable.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page