Plan a) You could compile your C/C++ code and produce a .ASM source file, then insert (inline) the _ct_shuffle_epi8 into the code. (this can be automated)
Plan b) write call_ct_shuffle_epi8 such that it back patches the call with the appropriate instruction code sequence. Note, you may need to have your C/C++ code use a macro and then make two calls to _ct_shuffle_epi8 in order to provide sufficient bytes in the code stream to insert your patch (plus a few NOOPs).
(Won't work on systems that protect the code segment from patching.)