- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm getting "unaligned access" messages to the console on an Itanium2 system (ifort 9.0 or ifort 9.1.033) from a particular subroutine, but only when compiled with -parallel or -openmp (for hand-coded directives). I get the messages whether running 1 or more threads. (I can get very good speed up with two threads on a different system with comparable memory bandwidth, so I think the code itself is not the issue.) I wonder if there is a performance impact from this message?
The "unaligned access" messages do NOT appear with a normal (unthreaded) compile, which runs the particular subroutine about 10 times faster. (i.e., the -openmp compile is 10-times slower!) The whole code is compiled with "-align all".
The compile line for the particular subroutine is
ifort -openmp -ftz -align all -O3 -I./include -I/usr/local/netcdf/include -c advect.f90
(otherwise it is without the -openmp)
Any ideas?
-- Ted
The "unaligned access" messages do NOT appear with a normal (unthreaded) compile, which runs the particular subroutine about 10 times faster. (i.e., the -openmp compile is 10-times slower!) The whole code is compiled with "-align all".
The compile line for the particular subroutine is
ifort -openmp -ftz -align all -O3 -I./include -I/usr/local/netcdf/include -c advect.f90
(otherwise it is without the -openmp)
Any ideas?
-- Ted
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There could be a performance impact. I'd ask that you send a buildable and runnable example to Intel Premier Support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Definitely a performance impact. I also tested the code on a new MacPro with the ifort compiler, and it runs fine in parallel, so I guess this is an ia64-specific issue.
I am preparing a simplified code base to submit to Intel Support through our IT person (they'll only let us have one contact, so everything has to go through one person).
I am preparing a simplified code base to submit to Intel Support through our IT person (they'll only let us have one contact, so everything has to go through one person).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured out part of the problem has to do with indices. For example, this loop gave unaligned access messages:
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx
vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)
IF( i .ge. 4 .and. i .le. nx-3+is ) THEN
fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )
ELSEIF( i .eq. 3 .or. i .eq. nx-2+is ) THEN
fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )
ELSE
fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))
ENDIF
ENDDO
ENDDO
ENDDO
ENDIF
But this version does not:
i3 = nx-3+is
i2 = nx-2+is
i1 = nx-1+is
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx
vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)
IF( i .ge. 4 .and. i .le. i3 ) THEN
fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )
ELSEIF( i .eq. 3 .or. i .eq. i2 ) THEN
fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )
ELSEIF( i .eq. 2 .or. i .eq. i1 ) THEN
fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))
ELSE
fx(i,j,k) = 0.0
ENDIF
ENDDO
ENDDO
ENDDO
ENDIF
There are some other loops that apparently have additional problems, so I'm still planning to submit code to Intel Premier Support, but I thought I'd take another crack at it first.
I had similar problems with loop limits that had Max or Min functions, like "DO i=1,Min(nx-1+is,ix+1)", where ifort on Itanium couldn't seem to handl e it. But set a temporary value, imx = Min(nx-1+is,ix+1) and then DO i=1,imx and it works. Bizarre!
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx
vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)
IF( i .ge. 4 .and. i .le. nx-3+is ) THEN
fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )
ELSEIF( i .eq. 3 .or. i .eq. nx-2+is ) THEN
fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )
ELSE
fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))
ENDIF
ENDDO
ENDDO
ENDDO
ENDIF
But this version does not:
i3 = nx-3+is
i2 = nx-2+is
i1 = nx-1+is
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx
vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)
IF( i .ge. 4 .and. i .le. i3 ) THEN
fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )
ELSEIF( i .eq. 3 .or. i .eq. i2 ) THEN
fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )
ELSEIF( i .eq. 2 .or. i .eq. i1 ) THEN
fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))
ELSE
fx(i,j,k) = 0.0
ENDIF
ENDDO
ENDDO
ENDDO
ENDIF
There are some other loops that apparently have additional problems, so I'm still planning to submit code to Intel Premier Support, but I thought I'd take another crack at it first.
I had similar problems with loop limits that had Max or Min functions, like "DO i=1,Min(nx-1+is,ix+1)", where ifort on Itanium couldn't seem to handl e it. But set a temporary value, imx = Min(nx-1+is,ix+1) and then DO i=1,imx and it works. Bizarre!

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page