- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attached is a small program implementing the Newton-Raphson iteration for solving y = x * exp(x). ifort 13 does not vectorize the program unless the MIC architecture is targeted. Comparing the Fortran again with the equivalent C code written using the elemental function extension, the C code shows a 1.8x speedup when measured on Nehalem. Arguably, icc 13 is not optimizing hard enough, either. A version based on intrinsic functions shows 2.1x speedup over the Fortran code. Greater gains can obviously be expected on Sandy/Ivy Bridge.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP (Intel) wrote:To clarify, I misread the generated assembly code for the MIC architecture. It is not vectorized, either. I did not explicitly make the loop body heavier in the intrinsic code than in the scalar code. The algorithm is exactly the same. This is a case where vectorization is almost always beneficial. Masking adds some small overhead, but you save a lot from vectorized division alone.I see that the Fortran elemental doesn't have the same effect on optimization here.as writing in the parallel intrinsics in icc. If I set more aggressive options, I get the message "not inner loop" indicating that the compiler hasn't learned outer loop vectorization for this situation. In effect, in your C code intrinsics, you have explicitly pushed enough work inside the while loop to take advantage of simd.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page