Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

What is syntax for broadcast decorator?


The ISE doc only describes the decorator syntax with the single example {1to16} (document 319433-022 page 7).

I would assume that generally you write {1ton} where n = the full vector size / the single element size.  But it would be nice to specify this exactly.

However, GNU `as` will not accept {1to4] or smaller.  Furthermore, it does not accept a broadcast decorator with a 128- or 256-bit vector size.  If I use .byte to assemble 128- and 256-bit instructions, the disassembler shows the {1to8} or {1to16} decorator regardless of VL.  Example:

    62 62 7d 18 c4 72 7f    vpconflictd xmm30,DWORD PTR [rdx+0x1fc]{1to16}
    62 62 7d 38 c4 72 7f    vpconflictd ymm30,DWORD PTR [rdx+0x1fc]{1to16}
    62 62 7d 58 c4 72 7f    vpconflictd zmm30,DWORD PTR [rdx+0x1fc]{1to16}

I can see two possibilities here.  Please confirm which (or something else) is the truth:

(1) Broadcasting is only supported for VL=512.  This should be added to the doc.  Also the instructions, like VADDPD for example, which support broadcasting show xmm3/m128/m64bcst and ymm3/m128/m64bcst as operands, and should be changed to simply xmm3/m128 and ymm3/m256.

(2) GNU is wrong, and should accept 128 and 256 bit broadcasting for assembly and show the correct decorator for disassembly.  Please specify if my assumption about {1ton} is correct.  If you agree with this, then I will file a bug report.

The instruction table (insns.dat) for the nasm assembler, by the way, does allow broadcast with all vector sizes.  I haven't tried to see what it uses for {1ton}.  My reading of the source code is that does expect n to be the vector size / element size.

-- the following added in a later edit ---

I also notice that the GNU assembler's instruction table (i386-opc.tbl) is missing templates for some valid (per the document) instructions, and presumably all of them, that have 128 and 256 VL.




0 Kudos
3 Replies

Hi, Michael!

Yes, n values in {1ton} decorators mean the number of elements that fit into a vector register, e.g. for m32bsct memory operand and XMM register n should be 4.

Unfortunately, I don't understand your concern about GNU as/objdump. Let's take a look at some testcases in as testsuite for vpconfilctd instruction.

vpconflictd xmm6{k7}, [edx+508]{1to4}  # AVX512{CD,VL} Disp8

vpconflictd ymm6{k7}, [edx+508]{1to8}  # AVX512{CD,VL} Disp8

[  ]*[a-f0-9]+:[  ]*62 f2 7d 1f c4 72 7f[  ]*vpconflictd 0x1fc\(%edx\)\{1to4\},%xmm6\{%k7\}

[  ]*[a-f0-9]+:[  ]*62 f2 7d 3f c4 72 7f[  ]*vpconflictd 0x1fc\(%edx\)\{1to8\},%ymm6\{%k7\}

[  ]*[a-f0-9]+:[  ]*62 f2 7d 1f c4 72 7f[  ]*vpconflictd xmm6\{k7\},DWORD PTR \[edx\+0x1fc\]\{1to4\}

[  ]*[a-f0-9]+:[  ]*62 f2 7d 3f c4 72 7f[  ]*vpconflictd ymm6\{k7\},DWORD PTR \[edx\+0x1fc\]\{1to8\}

As you can see, all broadcast decorators depend on vector register size.
Everything is similar for AVX512CD tests.

So could you please:

  • clarify your testcase;
  • provide some instructions with missing templates for 128/256VL?


0 Kudos

I'm glad to see that your version of binutils seems to be doing the right thing.  But my version of binutils doesn't.

The version of objdump that I have, from Cygwin, produces

   0:   62 f2 7d 1f c4 72 7f    vpconflictd xmm6{k7},DWORD PTR [edx+0x1fc]{1to16}

Here you see 1to16, instead of the correct 1to4.

Your version of binutils source doesn't seem to match my binutils binary.  My binary is

        GNU objdump (GNU Binutils)

It comes from Cygwin's package manager.

What version of the source and binary are you using?


0 Kudos

Michael R. wrote:

What version of the source and binary are you using?

I've used master branch, but it also works on v2.25

0 Kudos