Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

inline asm call


I am forced to use VS2005/64b by the SOW, and I need to use some SSSE3 and SSE4 intrinsics.
Google-ing it, I found out that these intrinsics are not supported in VS2005, thus I tried to mimic their action, for example:
_ct_shuffle_epi8 PROC
movdqa xmm0, [RCX] ; load param 1
pshufb xmm0, XMMWORD PTR [RDX] ; shuf with param 2
_ct_shuffle_epi8 ENDP
for the pshufb function from SSSE3.
In the C code I reffer it with:
extern "C"
inline __m128i _ct_shuffle_epi8(__m128i a, __m128i b);
All works good, with the observation that is far from optimal and the masm generates the following asm code when calling_ct_shuffle_epi8 () :
000c6 66 0f 7f 44 2420 movdqa XMMWORD PTR $T6999[rsp], xmm0
000cc e8 00 00 00 00 call _ct_shuffle_epi8
000d1 66 0f 73 df 08 pslrdq xmm7, 8
000d6 48 8d 54 24 30 lea rdx, QWORD PTR $T7002[rsp]
000db 48 8d 4c 24 20 lea rcx, QWORD PTR $T7001[rsp]
As one can see it generates a call to the function, instead of inlining it.
Given the fact VS2005/64b does not support inline _asm .. is there a way to inline the whole asm procedure? (like in the example above)
Or any other decent way to include some of the SSSE3 intrinsics in the VS2005 code?
Thank you.
0 Kudos
2 Replies
found the answer, maybe will help somebody ..
microsoft compiler doesn't have support under vs2005 for ssse3+ intrinsics
intel compiler does support them .. so just install the intel compiler and you will have all support for them
good luck
Black Belt
Plan a) You could compile your C/C++ code and produce a .ASM source file, then insert (inline) the _ct_shuffle_epi8 into the code. (this can be automated)

Plan b) write call _ct_shuffle_epi8 such that it back patches the call with the appropriate instruction code sequence. Note, you may need to have your C/C++ code use a macro and then make two calls to _ct_shuffle_epi8 in order to provide sufficient bytes in the code stream to insert your patch (plus a few NOOPs).

(Won't work on systems that protect the code segment from patching.)

Jim Dempsey