- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Hao,
You did not give much context of your loop, so I have to guess. Please add the -vec_report2 switch to get more vectorization diagnostics. For instance, given
int p1[4],p2[4],SAD;
doit() {
int i;
for (i=0;i<4;i++)
SAD += p1-p2;
}
you will see the following message:
test.c(5) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.
Such a diagnostic tells you that ICC has the ability to vectorize, but deems at inefficient at this point (trip count to short to warrant setup overhead). You can override this decision with a pragma vector always and see what the performance is. If you give me more details, I may have other suggestions.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
doit() {
int i;
for (i=0;i<4;i++)
SAD += abs(p1-p2);
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Hao,
That particular psadbw idiom is recognized only when p1[] and p2[] are declared unsigned char and the reduction is done into anyintegral accumulator. In that case, however, at least 8 iterations are required for MMX and at least 16 for SSE. When all data is int, the construct is vectorized differently.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Hao,
>It seems that using Intel Compiler vectorization is not just setting switch flags
I am sorry to hear that because we really try to make automatic vectorization as easy as possible. But you are right that familiarity with the instruction set and the way the vectorizer works can greatly increase the effectiveness of automatic vectorization and that is exactly why I wrote the vectorization handbook (http://www.shop-intel.com/shop/product.asp?pid=SIBK3560). If you are going to use vectorization a lot, I would recommend it. Also feel free to contact me directly with vectorization issues.
You may also find other Intel Press titles useful, or visit the Intel Products Page (http://www.intel.com/products/index.htm) to download optimization manuals.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page