I wanted to ask for feedback on an emulation of the VP2INTERSECT instructions:
The emulation is faster than the native instructions when only one of the output masks is returned. I consider the following three applications of VP2INTERSECT instructions:
- computing the intersection (common elements) of two arrays of integers (whether sorted or unsorted),
- computing the size of the intersection of two arrays of integers,
- removing common elements from two arrays of integers.
Only 3. requires both output masks, while 1. and 2. only need one.
Since the name of the instructions is VP2INTERSECT, I presume that the main application is 1. (possibly 2.), in which case a fast emulation could be useful.
But I may be wrong, so would like to ask if the two cases above (computing the intersection, or the size of the intersection of two arrays of integers) are the intended (or expected most frequent) use cases for these instructions?.