Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
15 Views

Optimization using intrinsics

Hi all,

I am trying to optimize a loop containg MAC operations. I tried using intrinsics but it increased the cycles.
From the assembly file generated by the compiler I could observe that the intrinsics are not converted to exact SSE instructions instead it does the scalar operation rather than packed operation(Eg: MULSS instread of MULPS). So I tried using inline assembly with __asm{} . But even this dint not give any gain.
Does that mean that compiler is doing a better optimization than what i am doing or is because of the overload due to switches from C to intrinsics/inline asm ?
0 Kudos
2 Replies
Highlighted
Black Belt
15 Views

We'll probably need a specific working example. In some cases, intrinsics code comes out closer to programmer's intent when interprocedural optimization is disabled, e.g. -Qip- (windows) or -fno-inline-functions (linux). It seems unlikely that a mulps intrinsic itself would expand without a mulps operation.
0 Kudos
Highlighted
Employee
15 Views

Prashanth,

Did you check to see if the compiler can autovectorize the loop without using the intrinsics (e.g. compiling with options like /QxSSE3, /QxSSE4.2, etc.) ?
If this is still an issue, please send us a compilable test case along with compiler options and steps to reproduce and we will look into it.

Thanks,
--mark
0 Kudos