Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

auto vectorization

manoel
初学者
811 次查看
1. IFort vectorizes only inner loops.
2. IFort does not vectorize low trip loops (2 steps)
Can I conclude that small (2x2) matrix operations (sum, multiply) will not be auto-vectorized althoug there are sse instructions to perform similar operations ?
0 项奖励
1 回复
Intel_C_Intel
员工
811 次查看

(1) This is generally true, although transformations such as loop interchanging and collapsing can involve outer loops too. Conversely, loop materialization enables vectorization of straightline code.

(2) For double-precision data, loops with just two iterations can still be vectorized, although the compiler may, by default, reject such short running loops for efficiency reasons, like misaligned or too much setup/cleanup overhead. Use the vectorization diagnostics (-vec_report2) to inspect the reason why loop are not vectorized, and experiment with pragmas and other compiler hints to see if efficient vectorization is possible. Perhaps you can post an example?

Hope this helps.

Aart Bik
http://www.aartbik.com/

0 项奖励
回复