First, your kernel's NDRange should be wide enough to utilize multiple SIMD.
Then, in order to motivate the compiler to use 2 SIMD lanes, make sure you have:
1- simultaneous Kernel instances executing the same target instruction
2- target instruction applied to elements of a data type fitting half the target SIMD. (data type length <= Target SIMD Width/2)
3- explicit declaration of target data type.