Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Archive
- Missed optimization opportunity ANY(array .eq. 0.0)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-10-2016
07:52 AM

84 Views

Missed optimization opportunity ANY(array .eq. 0.0)

It is a common occurrence to test a result array for conditions.

vector_mod = mod(vector_num, vector_i) !dir$ if(.true.) if(ANY(vector_mod .eq. 0) return !dir$ else do j=1,vector_length if(vector_mod(j) .eq. 0) return end do !dir$ endif

Where the ANY intrinsic or short loop is performing a relational operation on an array with scalar.

The expanded code IVF V16.0 update 1 on Windows generates scalar code for both !dir$ expansions.

On the MIC you have available __mmask16 _mm512_mask_cmpeq_epi32_mask and related instructions that could make quick work of making this determination using vectors.

As for usefulness, it is not unusual for code to contain:

a) Defensive code to detect for potential of divide by 0.0 ANY(array .eq. 0.0)

b) Convergence code to detect for convergence ANY(array .lt. bingo) or ANY(abs(array) .lt. bingo)

As an additional request:

c) ANY(isNaN(realArray))

Where it is vectorized and does not call for_is_nan_s_ (or d).

You might want to extend this to an intrinsic isNormal (in line and vectorized).

When I write simulation code that contains convergence routines and/or may produce 3D vector lengths of 0.0, that I must insert defensive code to test for unusual (exception) conditions, and that these tests typically do not vectorize (and are not in line). Are there others here on this forum that can express annoyance with the lack of vectorization in this area (and estimate what extent this impacts your performance).

Jim Dempsey

Link Copied

4 Replies

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-10-2016
08:07 AM

84 Views

For example take the isNaN:

The standard single precision (32-bit) NaN would be: s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx where s is the sign (most often ignored in applications) and x is non-zero (the value zero encodes infinities). Therefore, isNaN,including infinities would be;

s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx bitwise OR with 1000 0000 0111 1111 1111 1111 1111 1111 bitwise EQ (the vcmpd) 1111 1111 1111 1111 1111 1111 1111 1111

The above is fully vectorizable.

Jim Dempsey

Masrul

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-29-2016
04:42 PM

84 Views

Jim,

Though, i did not get everything you said. But i have a query, if a computationally expensive loop contains some conditional statement (i know , it might prevent optimization opportunities ), can such code take advantage(vectorization or any other forms) on KNC.

! Pseudo-code logical, allocatable:: check(:) real,allocatable::x(:),y(:),z(:) integer,parameter::natom allocate(check(natom)) allocate(x(natom),y(natom),z(natom)) do i=1,natom-1 do j=i+1,natom if(check)then dx=x(i)-x(j) dy=y(i)-y(j) dz=z(i)-z(j) !some computation will be following............. else continue end if end do end do

--Masrul

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-29-2016
07:26 PM

84 Views

any(array==0) may work the same as minval(abs(array))==0, and the latter has had simd instructions suitable for Fortran since SSE and SSE2. A sometimes important consideration is to convince the compiler not to generate a local array. Still the C and C++ compilers can't agree on how to do it. I myself filed a feature request years ago on some simple any() vectorization which has been implemented.

ifort generally has good flexibility about alternate vectorizable forms for the same operation, as Jim seems to want. It's reasonable to hope that whichever form appears most readable (probably not mine!) will optimize. In my experience with gfortran, more often than not, I could find only one form which could optimize fully, and I've even found that form to change in a few cases between gfortran versions. I continue to file ifort PR cases where ifort requires specific syntax to optimize when gfortran is more friendly about accepting alternates.

In the case of conditionals, the issue of "protects exception" is to be avoided as much as possible, by not performing arithmetic operations inside a conditional, with Fortran MERGE becoming an important tool. Even when VECTOR ALWAYS or omp simd are available to assure the compiler that you don't care about handling those exceptions, the generated code tends to be inefficient when you (inadvertently?) request speculative operations.

Masrul's example has enough discrepancy vs. valid syntax, and enough left unspecified, that I'm not certain what is intended. ifort performs many vector optimizations on block if ,, endif or where() even though they may depend on evaluation of both conditions or on masked move store. Either way, on the (over-simplified) face of it, the best expected gain is like 2x for 4 lanes, or only half of the expected vectorization gain for unconditional vectorizable operations. With some practice, I've been able to get Intel Parallel Advisor to display reasonable numbers about vector speedup by loop.

I'm working on an application now which has important loops containing conditionals on the value of the loop counter, so they can be vectorized better by splitting the loops to remove that conditional. Still I end up with a few ugly constructs like

#if __AVX__ || __MIC__

!$omp simd private(........)

#endif

because the private list helps the compiler eliminate some potential dependencies across loop boundaries, but the omp simd doesn't allow for the compiler to decide whether there are enough lanes to benefit from vectorization.

It may be interesting that omp simd uses private in a complementary way from omp parallel. In the latter, private is needed for correctness.

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-30-2016
07:28 AM

84 Views

Masrul,

In some situations where I produce unit vectors and where the possibility of where point A and point B are collocated (IOW expect divide by 0), I make a programming decision as to if I wish to produce a unit vector of {0.0, 0.0, 0.0} or a randomly pointed unit vector. This can be accomplished by (pseudo code).

lengthSquared = dx**2 + dy**2 + dz**2

if(lengthSquared .eq. 0.0) lengthSquared = 1.0 ! IOW produce {0.0 / 1.0, 0.0 / 1.0, 0.0 / 1.0}

sqrtLengthSquared = sqrt(lenghtSquared)

ux = dx / sqrtLenghtSquared

uy = dy / sqrtLengthSquared

uz = dz / sqrtLengthSquared

The above produces a null unit vector using conditional move.

Note, I omitted array indexing for simplicity of the pseudo code.

The random unit vector substitution is a little more involved but can be done using conditional move

Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.