Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Emmanuel_W_
New Contributor I
88 Views

Vectorization question

Hi,

I have a loop that do not vectorize due to "unsuported data type" and I am not sure I understand why.
In the following code snippet the first loop do vectorize but requires an intermediate storage.
pResidual is a pointer to a short array.

[cpp]__ALIGN16 int tempPred1[16];
__ALIGN16 short temp[16];


for(int u=0;u<16;u++)
{
    temp = (short)((tempPred1 + 8)>>4);
    pResidual = temp;
}
pResidual+=16;


for(int u=0;u<16;u++)
{
    pResidual = (short) ((tempPred1 + 8)>>4);
}
pResidual+=16;[/cpp]


Any idea?

Thanks,
Emmanuel
0 Kudos
3 Replies
88 Views

Quoting - eweber
Hi,

I have a loop that do not vectorize due to "unsuported data type" and I am not sure I understand why.
In the following code snippet the first loop do vectorize but requires an intermediate storage.
pResidual is a pointer to a short array.

[cpp]__ALIGN16 int tempPred1[16];
__ALIGN16 short temp[16];


for(int u=0;u<16;u++)
{
    temp = (short)((tempPred1 + 8)>>4);
    pResidual = temp;
}
pResidual+=16;


for(int u=0;u<16;u++)
{
    pResidual = (short) ((tempPred1 + 8)>>4);
}
pResidual+=16;[/cpp]


Any idea?

Thanks,
Emmanuel

I don't have the answer, but I did try some code. I noticed that 1) for the first loop I received only partial vectorization. 2) if I change the int to a short and remove the shift, the loops vectorize.

Mixing datatypes (short + int) may be the culprit. Max
Dale_S_Intel
Employee
88 Views

I think the problem comes from mixing data types, i.e. you've got both int and short in there. If you change it so that everything is short (assuming that works for you) then you should be able to get it to vectorize:

[cpp]$ cat bug.cpp
void foo() {
    short *pResidual=0;
    short tempPred1[16];  
    short temp[16] = {0};  
      
    for(int u=0;u<16;u++)  
    {  
        temp = (short)(((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
      
    for(int u=0;u<16;u++)  
    {  
        pResidual = (short) (((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
}
$ icc -c -vec-report2 bug.cpp 
bug.cpp(6): (col. 5) remark: LOOP WAS VECTORIZED.
bug.cpp(12): (col. 5) remark: LOOP WAS VECTORIZED.
$ 
[/cpp]

Note that even the intermediate calculations may need to be cast to int to get it to work. Of course if you can change everything to int that would also work.

Is that doable in your original code?

Dale
Emmanuel_W_
New Contributor I
88 Views

I think the problem comes from mixing data types, i.e. you've got both int and short in there. If you change it so that everything is short (assuming that works for you) then you should be able to get it to vectorize:

[cpp]$ cat bug.cpp
void foo() {
    short *pResidual=0;
    short tempPred1[16];  
    short temp[16] = {0};  
      
    for(int u=0;u<16;u++)  
    {  
        temp = (short)(((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
      
    for(int u=0;u<16;u++)  
    {  
        pResidual = (short) (((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
}
$ icc -c -vec-report2 bug.cpp 
bug.cpp(6): (col. 5) remark: LOOP WAS VECTORIZED.
bug.cpp(12): (col. 5) remark: LOOP WAS VECTORIZED.
$ 
[/cpp]

Note that even the intermediate calculations may need to be cast to int to get it to work. Of course if you can change everything to int that would also work.

Is that doable in your original code?

Dale

Hi,

Thanks for the update. UnfortunatelytempPred1 has a dynamic range of 20 bits. The shift operation is actually to reduce the range to 16 bits. I can't
change pResidualeither which is a a pointer to a 16 bit video frame.
I guess the code is easy enough to write with intrinsic so I will go that route.

Emmanuel
Reply