Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Vectorize Capability of Intel Compiler in video coding

hzhang08
Beginner
518 Views
I checked the asm code generated by Intel Compile (IC) C++9.0. It seems that IC could not vectorize the following functions:
for (i=0;i<4;i++)
SAD += p1-p2;
However, IC succesfully vectorize this function (replacing'+=' with'=':
for (i=0;i<4;i++)
SAD = p1-p2;
IC also cannot vectorize many other codec functions like DCT. I wonder if there's any way to improvethis or it's just the current status of IC. It'll also be of great help to know if there's a way to write C codes to makethem recognized by IC. Otherwise, I have to write my own MMX/SSE codes, whichis time consuming and has poor portability.
Thanks a lot!
Hao
0 Kudos
6 Replies
Intel_C_Intel
Employee
518 Views

Dear Hao,

You did not give much context of your loop, so I have to guess. Please add the -vec_report2 switch to get more vectorization diagnostics. For instance, given

int p1[4],p2[4],SAD;
doit() {
int i;
for (i=0;i<4;i++)
SAD += p1-p2;
}

you will see the following message:

test.c(5) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.

Such a diagnostic tells you that ICC has the ability to vectorize, but deems at inefficient at this point (trip count to short to warrant setup overhead). You can override this decision with a pragma vector always and see what the performance is. If you give me more details, I may have other suggestions.

Aart Bik
http://www.aartbik.com/

0 Kudos
hzhang08
Beginner
518 Views
Dear Aart:
Thank you for the quick reply! I still have the following question:
When calculating the summation of absolute difference value:
#pragma vector always
int p1[4],p2[4],SAD;
doit() {
int i;
for (i=0;i<4;i++)
SAD += abs(p1-p2);
}
The report is (col. 3) remark: loop was not vectorized: low trip count.
I wonder why the compiler doesn't use SSE instruction PSADBW.
Thank you very much!
Hao
0 Kudos
Intel_C_Intel
Employee
518 Views

Dear Hao,

That particular psadbw idiom is recognized only when p1[] and p2[] are declared unsigned char and the reduction is done into anyintegral accumulator. In that case, however, at least 8 iterations are required for MMX and at least 16 for SSE. When all data is int, the construct is vectorized differently.

Aart Bik
http://www.aartbik.com/

0 Kudos
hzhang08
Beginner
518 Views
Dear Aart:
It seems that using Intel Compiler vectorization is not just setting switch flags.It would begood to knowmore details about vectorizationand its relation with MMX. Will your vectorizationhand book help in this aspect?
Also I would greatly apprieciate if you couldrecommend a book on writing more efficient MMX/SSE (I've already written quite some MMX codes, but not sure if they are optimal and if there's room for further improvement based on better understanding of hardware structure).
Thank you very much,
Hao
0 Kudos
Intel_C_Intel
Employee
518 Views

Dear Hao,

>It seems that using Intel Compiler vectorization is not just setting switch flags

I am sorry to hear that because we really try to make automatic vectorization as easy as possible. But you are right that familiarity with the instruction set and the way the vectorizer works can greatly increase the effectiveness of automatic vectorization and that is exactly why I wrote the vectorization handbook (http://www.shop-intel.com/shop/product.asp?pid=SIBK3560). If you are going to use vectorization a lot, I would recommend it. Also feel free to contact me directly with vectorization issues.

You may also find other Intel Press titles useful, or visit the Intel Products Page (http://www.intel.com/products/index.htm) to download optimization manuals.

Aart Bik
http://www.aartbik.com/

0 Kudos
hzhang08
Beginner
518 Views
Dear Aart:
I'm trying to avoid using mmx instructions and expect similar performance by vectorization.I'm going to read your vectorization book and may ask you some more questions...
Thank you again for your help in understanding vectorization !
Hao
0 Kudos
Reply