Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Optimization using intrinsics

Prashanthns
Beginner
323 Views
Hi all,

I am trying to optimize a loop containg MAC operations. I tried using intrinsics but it increased the cycles.
From the assembly file generated by the compiler I could observe that the intrinsics are not converted to exact SSE instructions instead it does the scalar operation rather than packed operation(Eg: MULSS instread of MULPS). So I tried using inline assembly with __asm{} . But even this dint not give any gain.
Does that mean that compiler is doing a better optimization than what i am doing or is because of the overload due to switches from C to intrinsics/inline asm ?
0 Kudos
2 Replies
TimP
Honored Contributor III
323 Views
We'll probably need a specific working example. In some cases, intrinsics code comes out closer to programmer's intent when interprocedural optimization is disabled, e.g. -Qip- (windows) or -fno-inline-functions (linux). It seems unlikely that a mulps intrinsic itself would expand without a mulps operation.
0 Kudos
Mark_S_Intel1
Employee
323 Views
Prashanth,

Did you check to see if the compiler can autovectorize the loop without using the intrinsics (e.g. compiling with options like /QxSSE3, /QxSSE4.2, etc.) ?
If this is still an issue, please send us a compilable test case along with compiler options and steps to reproduce and we will look into it.

Thanks,
--mark
0 Kudos
Reply