the mask containing the indexes is stored in XSRC1 and the bytes to be permuted are in XSRC2. Why is the mask not in XSRC2 and viceversa. The PERMILPS and other instructions have used the implicit mask as the last SRC. Is there a reason you changed this? I just ask because the x86 instruction set is complicated as it is.. and now to have some forms of PERM instructions in one orientation and others in the opposite.. it's just confusing.
Also, why don't you provide a similar implicit form for PERMQ and PERMPD? There it's only encoded with immediates.
I wanted to offer some advice and suggest keeping the format uniform, as much as possible.