Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Jones__Brian
New Contributor I
144 Views

VCVTTPD2QQ loads 4 floats, not 8, encoded for zmm register

I am new to this list, so I hope I'm posting new information in the correct way.  The problem described below has been solved and I'm posting the answer here for others in the future.  The problem was a bug in the gdb debugger -- apparently it cannot show the upper 256 bits of a zmm register with the command i r zmm0.  All 8 values are actually in the zmm register, but gdb can't show it all. 

The Intel Software Developers Manual describes VCVTTPD2QQ as:

"Convert eight packed double-precision floating-point values from zmm2/m512 to eight packed quadword integers in zmm1 using truncation with writemask k1."

I am using VCVTTPD2QQ to load eight double-precision 64-bit floats into zmm1, encoded as follows:

mov rax,18446744073709551615

KMOVQ k1,rax

EVEX.512.66.0F.W1 VCVTTPD2QQ zmm1 {k1}{z},[r11+r15]

where r11 is a pointer to an array of 10,000 64-bit double-precision floats and r15 is 0 (the base address of the array). The data are read in from an external source. I want to load the first 8 into zmm1, but it loads only 4.

I fill k1 with all 1's to indicate to move all 8 numbers.

The Intel manual distinguishes the three possible encodings by the register names; if the name is a zmm register, then it should move 8.

What is wrong with my encoding that I get only 4 but not 8 data points loaded into zmm1?

 

0 Kudos
2 Replies
Beulich__Jan
Beginner
144 Views

Jones__Brian
New Contributor I
144 Views

Since the time I posted this, I have had erratic performance from gdb on this issue.  Sometimes it shows me all 8 and sometimes only 4.  For that reason, I don't rely on the gdb output from i r xmm0 to show all 8 registers. 

Reply