Am trying to optimize my codec application on Intel Platform. before kick start Is the any numbers for the performance comparison numbers between Intel Assembly vs Intel Intrinsics? Am in delima to choose between the two approaches.
If the specific intrinsic is directly compiled into single machine code instruction you will probably not see any difference in the performance when comparing inline assembly vs. compiler intrinsic.
The compiler has more latitude to optimize intrinsics. For example, Intel C++ will choose more effective equivalent instructions, or switch from SSE to AVX-128, when permitted according to the architecture flag.