Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29266 Discussions

ia64 "unaligned access" messages with SMP

burgel
Beginner
647 Views
I'm getting "unaligned access" messages to the console on an Itanium2 system (ifort 9.0 or ifort 9.1.033) from a particular subroutine, but only when compiled with -parallel or -openmp (for hand-coded directives). I get the messages whether running 1 or more threads. (I can get very good speed up with two threads on a different system with comparable memory bandwidth, so I think the code itself is not the issue.) I wonder if there is a performance impact from this message?

The "unaligned access" messages do NOT appear with a normal (unthreaded) compile, which runs the particular subroutine about 10 times faster. (i.e., the -openmp compile is 10-times slower!) The whole code is compiled with "-align all".

The compile line for the particular subroutine is

ifort -openmp -ftz -align all -O3 -I./include -I/usr/local/netcdf/include -c advect.f90

(otherwise it is without the -openmp)

Any ideas?

-- Ted
0 Kudos
3 Replies
Steven_L_Intel1
Employee
647 Views
There could be a performance impact. I'd ask that you send a buildable and runnable example to Intel Premier Support.
0 Kudos
burgel
Beginner
647 Views
Definitely a performance impact. I also tested the code on a new MacPro with the ifort compiler, and it runs fine in parallel, so I guess this is an ia64-specific issue.

I am preparing a simplified code base to submit to Intel Support through our IT person (they'll only let us have one contact, so everything has to go through one person).
0 Kudos
burgel
Beginner
647 Views
I figured out part of the problem has to do with indices. For example, this loop gave unaligned access messages:

!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx

vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)

IF( i .ge. 4 .and. i .le. nx-3+is ) THEN

fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )


ELSEIF( i .eq. 3 .or. i .eq. nx-2+is ) THEN

fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )

ELSE

fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))

ENDIF

ENDDO
ENDDO
ENDDO

ENDIF

But this version does not:

i3 = nx-3+is
i2 = nx-2+is
i1 = nx-1+is
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,j,k,vv,im1,dir)
DO k = kmn,kmx
DO j = jmn,jmx
DO i = imn,imx

vv = 0.5*(u(i,j,k) + u(i-is,j-js,k-ks))
im1 = max(i-1,1)
dir = sign(1.0,vv)

IF( i .ge. 4 .and. i .le. i3 ) THEN

fx(i,j,k) = vv * ( f50 * (s(i, j,k) + s(i-1,j,k)) &
- f51 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f52 * (s(i+2,j,k) + s(i-3,j,k)) &
- f52 * (s(i+2,j,k) - s(i-3,j,k) &
- 5.0 * (s(i+1,j,k) - s(i-2,j,k)) &
+ 10. * (s(i, j,k) - s(i-1,j,k)))*dir )


ELSEIF( i .eq. 3 .or. i .eq. i2 ) THEN

fx(i,j,k) = vv * ( f30 * (s(i,j,k) + s(i-1,j,k)) &
- f31 * (s(i+1,j,k) + s(i-2,j,k)) &
+ f31 * (s(i+1,j,k) - s(i-2,j,k) &
- 3.0 * (s(i,j,k)-s(i-1,j,k)))*dir )

ELSEIF( i .eq. 2 .or. i .eq. i1 ) THEN

fx(i,j,k) = vv * 0.5 * (s(i,j,k) + s(im1,j,k))

ELSE

fx(i,j,k) = 0.0

ENDIF

ENDDO
ENDDO
ENDDO

ENDIF

There are some other loops that apparently have additional problems, so I'm still planning to submit code to Intel Premier Support, but I thought I'd take another crack at it first.

I had similar problems with loop limits that had Max or Min functions, like "DO i=1,Min(nx-1+is,ix+1)", where ifort on Itanium couldn't seem to handl e it. But set a temporary value, imx = Min(nx-1+is,ix+1) and then DO i=1,imx and it works. Bizarre!
0 Kudos
Reply