Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1058 Discussions

VCVTTPD2QQ loads 4 floats, not 8, encoded for zmm register

New Contributor I

I am new to this list, so I hope I'm posting new information in the correct way.  The problem described below has been solved and I'm posting the answer here for others in the future.  The problem was a bug in the gdb debugger -- apparently it cannot show the upper 256 bits of a zmm register with the command i r zmm0.  All 8 values are actually in the zmm register, but gdb can't show it all. 

The Intel Software Developers Manual describes VCVTTPD2QQ as:

"Convert eight packed double-precision floating-point values from zmm2/m512 to eight packed quadword integers in zmm1 using truncation with writemask k1."

I am using VCVTTPD2QQ to load eight double-precision 64-bit floats into zmm1, encoded as follows:

mov rax,18446744073709551615

KMOVQ k1,rax

EVEX.512.66.0F.W1 VCVTTPD2QQ zmm1 {k1}{z},[r11+r15]

where r11 is a pointer to an array of 10,000 64-bit double-precision floats and r15 is 0 (the base address of the array). The data are read in from an external source. I want to load the first 8 into zmm1, but it loads only 4.

I fill k1 with all 1's to indicate to move all 8 numbers.

The Intel manual distinguishes the three possible encodings by the register names; if the name is a zmm register, then it should move 8.

What is wrong with my encoding that I get only 4 but not 8 data points loaded into zmm1?


0 Kudos
2 Replies
New Contributor I

Since the time I posted this, I have had erratic performance from gdb on this issue.  Sometimes it shows me all 8 and sometimes only 4.  For that reason, I don't rely on the gdb output from i r xmm0 to show all 8 registers.