Ok, thanks for the help. - Page 2

George · ‎04-14-2015

Hello ,

I can't find a way to cast a __m256i variable to integer!

Any ideas?

Thanks!

George · ‎04-22-2015

Ok, now it is clear.

So , when we use intrinsics we don't get a report that states 'vectorized' because it is already vectorized.

Now,for the hint you gave me.

You say to determine if any floats in the vector are less than D.Do you mean TempD?

Because we must compare :

if ( TempD[ j ] < D )

But,then , TempD is a float and D is a vector ( if I change it to vector as you said above ).

If I make TempD a vector , I must change also the :

_mm256_store_ps( TempD, _mm256_add_ps(...

And , I have found a lot of comparison commands and I don't know which to use! (suppose 256 (or 512) for all of these ).

You mention cmpeq but I can't find something useful.

Regarding the gmin to find the minimum. I haven't find this anywhere.Instead , I am finding various min commands , as _mm256_min_ps..

Thank you!

jimdempseyatthecove · ‎04-22-2015

Note that I said "Starting with a vector of FLT_MAX (an __m512, say D)"

This indicates you change "float D" into "__m512 D". This initially holds FLT_MAX spread across the vector, but as the loop(s) progress, it gets updated to hold the minimum value of the "sum of the squares" spread across the vector, which no longer require a memory array (TempD is removed from memory), but now can reside within a register. As you compute a next vector of "sum of the squares", each float representing the next 8 or 16 values, then using the new vector D, test the next vector of "sum of squares" to see if any of the values are lower than that (those) stored in D, if there is, then obtain the new minimum, update D (all 8/16 floats) to new minimum, locate the position in the vector of "sum of squares" of the first float holding the new minimum, and... instead of fetching into T a new value from V[], you remember the information necessary to construct the index into V[] (this is the inner loop index and the __mmask16 bitmask). After the end of the inner loop, you use the remembered index (of the vector holding minimum value) together with the mask holding the mask of positions in that (sum of squares) vector holding the minimum value to reconstruct the index into V[] then perform the fetch into T outside the inner loop.

It might help if you rename your variables to be representative of their purpose.

Consider changing "D" to "vectorOfMinimum", and naming the result of the "sumOfSquares", formerly TempD into "vectorOfSumOfSquares".

Things like that. While you are typing more letters in the variable names, this removes a necessity of more (or all) words in a comment that belongs on the statements.

Jim Dempsey

jimdempseyatthecove · ‎04-22-2015

>>You mention cmpeq but I can't find something useful....Regarding the gmin to find the minimum

My #21 response clearly indicated targeting MIC, __mm512, however this is actually the subset for Knights Corner (KNC) subset of AVX-512.

Save this link: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Open the link and you will find the Intrinsics guide. If you check the technologies " KNC", this will constrict the search results to those intrinsics exclusive to KNC subset of AVX-512.

Jim Dempsey

RE: ...cmpeq...

-----

__mmask16 _mm512_cmpeq_ps_mask (__m512 a, __m512 b)
#include "zmmintrin.h"
Instruction: vcmpps k {k}, zmm, zmm, imm
CPUID Flags: AVX512F for AVX-512, KNCNI for KNC

Description

Compare packed single-precision (32-bit) floating-point elements in a and b for equality, and store the results in mask vector k.

Operation
FOR j := 0 to 15
i := j*32
k := (a[i+31:i] == b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0

George · ‎04-22-2015

Ok, thanks for the help.

I will try ,but this is a little overhelming for me..

Thanks!

jimdempseyatthecove · ‎04-22-2015

>>little overhelming for me...

Sounds like a nautical term (helm is where you control the boat/ship), but overhelming might be apropos in this case (over controlling). ;)

Jim Dempsey

George · ‎04-23-2015

Hehe.. :)

Missed a "w".

kecoro · ‎04-27-2015

oh i don't know

cast __m256i to int