what are masked assignments

psing51 · ‎07-30-2015

I came across following code while going through intel's Auto vectorization tutorial,

for (int i=0; i<length; i++) {
float s = b*b - 4*a*c;
if ( s >= 0 ) {
s = sqrt(s) ;
x2 = (-b+s)/(2.*a);
x1 = (-b-s)/(2.*a);
}
else {
x2 = 0.;
x1 = 0.;
}

“if” statements are allowed if they can be implemented as masked assignments, which is usually the case.
so how exactly does the masked assignments work ? I think i have a little grasp on the masked assignment concept from here.
are these bit vectors are calculated via compilers by applying some heuristics or at run time ( BPU) ?

i understand that bit vectors with TRUE value are clubbed together and viceversa for FALSE & both of them are executed altogether.
but how exactly the mechanism works with intel machine ?

also what will be the scenario if code is non-straight line;
i.e.

for(i=0;i<lengthA;i++)
 for(j=0;j<lengthB;j++)
   {
 float s = a*b;
if ( s >= 0 ) {
c = sqrt(s) ;
}
else {

c = 0.01 ;

}
}

A simplified/tutorialized explanation will be very fruitful .Eagerly awaiting your reply ,

KitturGanesh · ‎07-31-2015

Hi Puneet,
An assignment is considered a masked array kind if it appears in a where construct for which the variable assigned is an array and is ideally used when it is necessary to assign only certain elements of one array to another.

For example: where ((X /= 0.0) A = 1.0/X | (A=1.0))

The above statement assigns only where the elements of the array X are non-zero and use 1.0 for zero elements. So, basically, the assignment is made to selective elements of the array based on a mask for picking out the array elements. The mask itself is an expression or an condition in the where clause.

In the link you show and which you read about, the Mi is the bit vector that was used as a mask control expression in the where clause for which the mask vector is either 1 or 0 and the masked array assignment is controlled using this mask vector so the vector operation is made only on elements that correspond to the 1 bit in the mask vector.

Hope the above helps.

_Kittur

TimP · ‎07-31-2015

In such cases, vectorization is done by calculating the if and else branches separately, independent of the comparison (the result of which may be an implicit mask vector), then merging the results under control of the mask, preferably by a blend instruction.

You may be able to control by #pragma whether the compiler uses heuristics to evaluate the gain from vectorization, as it probably would by default. Among the factors considered then might be data alignments and associated assertions.

sqrt normally produces implicit cast of float operand to double, which is likely to inhibit vectorization. There seems little point in widening the operands there. If you wanted efficiency, you would write sqrtf and 0.01f; if you wished to protect range and improve reliability you would write e.g.

double s = b*(double)b - 4*(double)a*c;

and consult a numerical analysis textbook.

There was a mention in a recent training material that "using namespace std" might make a difference as to whether sqrt is considered generic, not implying the cast to double. C11 and C++11 seem to have backed off the concept of genericity which wasn't applied consistently.