Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.

Optimization using intrinsics

Prashanthns
Beginner
180 Views
Hi all,

I am trying to optimize a loop containg MAC operations. I tried using intrinsics but it increased the cycles.
From the assembly file generated by the compiler I could observe that the intrinsics are not converted to exact SSE instructions instead it does the scalar operation rather than packed operation(Eg: MULSS instread of MULPS). So I tried using inline assembly with __asm{} . But even this dint not give any gain.
Does that mean that compiler is doing a better optimization than what i am doing or is because of the overload due to switches from C to intrinsics/inline asm ?
0 Kudos
2 Replies
TimP
Black Belt
180 Views
We'll probably need a specific working example. In some cases, intrinsics code comes out closer to programmer's intent when interprocedural optimization is disabled, e.g. -Qip- (windows) or -fno-inline-functions (linux). It seems unlikely that a mulps intrinsic itself would expand without a mulps operation.
Mark_S_Intel1
Employee
180 Views
Prashanth,

Did you check to see if the compiler can autovectorize the loop without using the intrinsics (e.g. compiling with options like /QxSSE3, /QxSSE4.2, etc.) ?
If this is still an issue, please send us a compilable test case along with compiler options and steps to reproduce and we will look into it.

Thanks,
--mark
Reply