Used SSE to work with for example, 6-bit packed integers.SSErequired a lot of heavy lifting thru masking and shifting and storing of results into temporary registers; prior to doing the arithmetic or logical operation. I'vebeen studying AVX2 instructions looking for a more optimal set of instructions to do this. Has anyone on this forum already looked at optimizing this type of workload? There are a lot of new instructions for manipulating 32-bit and 64-bit data units; but it is not obvious to me at this piont how these can help with this 6-bit packed integer problem.
Wow, those instructions are fantastic. I never imagined a complex operation like that was even possible in a single pipelined instruction, but after reading up on the 'butterfly' datapath it's actually quite elegant.
I really look forward to CPUs with AVX2 and BMI. Am I correct that Haswell won't support BMI2 yet? PDEP and PEXT are not mentioned in the Haswell New Instructions blog. Hopefully it's scheduled for Broadwell then.