- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that ifort (and gfortran) create a temporary for the following array assignment:
a(1:n-inc:inc)= a(inc+1:n:inc)+b(1:n-inc:inc)
presumably because of the possibility that inc is less than zero. The result is stored in a stride 1 temporary and then copied to the destination, all reporting vectorization.
If I write
do i= 1,n-inc,inc
a(i)= a(i+inc)+b(i)
enddo
ifort decides not to vectorize with /QxAVX2. Apparently, that's a good decision, as adding a !dir$ simd to produce simulated gather-scatter makes it slower, even in the case inc==1 (but not as slow as the array assignment with temporary).
Intel's vecanalysis script:
reports heavy-overhead vectorization.
Just one more data point in the continuing question about marginal vectorization.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Tim. That's interesting - is there some action needed on our part, or was this just an observation you wanted to get out there?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wanted to give an example of the importance of digging up the detailed compiler reports and reading between the lines to see when vectorization isn't the whole answer. It was new to me to see the same temporary array problem in gfortran as in ifort; I used to find them by noting when gfortran performed better.
By the way, the Fortran 77 version with !dir$ simd is the one which is needed for performance on MIC. Having to put #ifdef __MIC__ around simd and novector directives detracts somewhat from the value of ifort in being able to run the same source code efficiently on host and coprocessor.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page