VCVTTPD2QQ loads 4 floats, not 8, encoded for zmm register

Jones__Brian · ‎02-04-2020

I am new to this list, so I hope I'm posting new information in the correct way. The problem described below has been solved and I'm posting the answer here for others in the future. The problem was a bug in the gdb debugger -- apparently it cannot show the upper 256 bits of a zmm register with the command i r zmm0. All 8 values are actually in the zmm register, but gdb can't show it all.

The Intel Software Developers Manual describes VCVTTPD2QQ as:

"Convert eight packed double-precision floating-point values from zmm2/m512 to eight packed quadword integers in zmm1 using truncation with writemask k1."

I am using VCVTTPD2QQ to load eight double-precision 64-bit floats into zmm1, encoded as follows:

mov rax,18446744073709551615

KMOVQ k1,rax

EVEX.512.66.0F.W1 VCVTTPD2QQ zmm1 {k1}{z},[r11+r15]

where r11 is a pointer to an array of 10,000 64-bit double-precision floats and r15 is 0 (the base address of the array). The data are read in from an external source. I want to load the first 8 into zmm1, but it loads only 4.

I fill k1 with all 1's to indicate to move all 8 numbers.

The Intel manual distinguishes the three possible encodings by the register names; if the name is a zmm register, then it should move 8.

What is wrong with my encoding that I get only 4 but not 8 data points loaded into zmm1?

Beulich__Jan · ‎03-23-2020

Btw, this gdb issue was, I believe, fixed back in Nov 2018:

https://sourceware.org/git?p=binutils-gdb.git;a=commitdiff;h=b5420128da08dc81d94b265e88083d172909ea25

Jan

Jones__Brian · ‎03-23-2020

Since the time I posted this, I have had erratic performance from gdb on this issue. Sometimes it shows me all 8 and sometimes only 4. For that reason, I don't rely on the gdb output from i r xmm0 to show all 8 registers.