Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Vectorization question

Emmanuel_W_
New Contributor I
138 Views
Hi,

I have a loop that do not vectorize due to "unsuported data type" and I am not sure I understand why.
In the following code snippet the first loop do vectorize but requires an intermediate storage.
pResidual is a pointer to a short array.

[cpp]__ALIGN16 int tempPred1[16];
__ALIGN16 short temp[16];


for(int u=0;u<16;u++)
{
    temp = (short)((tempPred1 + 8)>>4);
    pResidual = temp;
}
pResidual+=16;


for(int u=0;u<16;u++)
{
    pResidual = (short) ((tempPred1 + 8)>>4);
}
pResidual+=16;[/cpp]


Any idea?

Thanks,
Emmanuel
0 Kudos
3 Replies
Maximillia_D_Intel
138 Views
Quoting - eweber
Hi,

I have a loop that do not vectorize due to "unsuported data type" and I am not sure I understand why.
In the following code snippet the first loop do vectorize but requires an intermediate storage.
pResidual is a pointer to a short array.

[cpp]__ALIGN16 int tempPred1[16];
__ALIGN16 short temp[16];


for(int u=0;u<16;u++)
{
    temp = (short)((tempPred1 + 8)>>4);
    pResidual = temp;
}
pResidual+=16;


for(int u=0;u<16;u++)
{
    pResidual = (short) ((tempPred1 + 8)>>4);
}
pResidual+=16;[/cpp]


Any idea?

Thanks,
Emmanuel

I don't have the answer, but I did try some code. I noticed that 1) for the first loop I received only partial vectorization. 2) if I change the int to a short and remove the shift, the loops vectorize.

Mixing datatypes (short + int) may be the culprit. Max
Dale_S_Intel
Employee
138 Views
I think the problem comes from mixing data types, i.e. you've got both int and short in there. If you change it so that everything is short (assuming that works for you) then you should be able to get it to vectorize:

[cpp]$ cat bug.cpp
void foo() {
    short *pResidual=0;
    short tempPred1[16];  
    short temp[16] = {0};  
      
    for(int u=0;u<16;u++)  
    {  
        temp = (short)(((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
      
    for(int u=0;u<16;u++)  
    {  
        pResidual = (short) (((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
}
$ icc -c -vec-report2 bug.cpp 
bug.cpp(6): (col. 5) remark: LOOP WAS VECTORIZED.
bug.cpp(12): (col. 5) remark: LOOP WAS VECTORIZED.
$ 
[/cpp]

Note that even the intermediate calculations may need to be cast to int to get it to work. Of course if you can change everything to int that would also work.

Is that doable in your original code?

Dale
Emmanuel_W_
New Contributor I
138 Views
I think the problem comes from mixing data types, i.e. you've got both int and short in there. If you change it so that everything is short (assuming that works for you) then you should be able to get it to vectorize:

[cpp]$ cat bug.cpp
void foo() {
    short *pResidual=0;
    short tempPred1[16];  
    short temp[16] = {0};  
      
    for(int u=0;u<16;u++)  
    {  
        temp = (short)(((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
      
    for(int u=0;u<16;u++)  
    {  
        pResidual = (short) (((short)(tempPred1 + 8))>>4);  
    }  
    pResidual+=16;  
}
$ icc -c -vec-report2 bug.cpp 
bug.cpp(6): (col. 5) remark: LOOP WAS VECTORIZED.
bug.cpp(12): (col. 5) remark: LOOP WAS VECTORIZED.
$ 
[/cpp]

Note that even the intermediate calculations may need to be cast to int to get it to work. Of course if you can change everything to int that would also work.

Is that doable in your original code?

Dale

Hi,

Thanks for the update. UnfortunatelytempPred1 has a dynamic range of 20 bits. The shift operation is actually to reduce the range to 16 bits. I can't
change pResidualeither which is a a pointer to a 16 bit video frame.
I guess the code is easy enough to write with intrinsic so I will go that route.

Emmanuel
Reply