Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Technologies
- Intel® ISA Extensions
- Problems encountred during vectorization of code using SSE intrinsics

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

priyanka06

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-23-2012
08:14 AM

60 Views

Problems encountred during vectorization of code using SSE intrinsics

I have been struggling with vectorizing a particular application for sometime now and I have tried everything. From autovectorization, to handcoded SSE intrinsics. But somehow I am unable to obtain speedup on my stencil based application.

Following is a snippet of my current code, which I have vectorized using SSE intrinsics.`//#pragma ivdep`

for ( i = STENCIL; i < z - STENCIL; i+=4 )

{

it = it2 + i;

__m128 center = _mm_mul_ps(_mm_load_ps(&p2[it]),C00_i);

u_j4 = _mm_load_ps(&p2[i+j*it_j-it_j4+k*it_k]); //Line 180

u_j3 = _mm_load_ps(&p2[i+j*it_j-it_j3+k*it_k]);

u_j2 = _mm_load_ps(&p2[i+j*it_j-it_j2+k*it_k]);

u_j1 = _mm_load_ps(&p2[i+j*it_j-it_j +k*it_k]);

u_j8 = _mm_load_ps(&p2[i+j*it_j+it_j4+k*it_k]);

u_j7 = _mm_load_ps(&p2[i+j*it_j+it_j3+k*it_k]);

u_j6 = _mm_load_ps(&p2[i+j*it_j+it_j2+k*it_k]);

u_j5 = _mm_load_ps(&p2[i+j*it_j+it_j +k*it_k]);

__m128 tmp2i = _mm_mul_ps(_mm_add_ps(u_j4,u_j8),X4_i);

__m128 tmp3 = _mm_mul_ps(_mm_add_ps(u_j3,u_j7),X3_i);

__m128 tmp4 = _mm_mul_ps(_mm_add_ps(u_j2,u_j6),X2_i);

__m128 tmp5 = _mm_mul_ps(_mm_add_ps(u_j1,u_j5),X1_i);

__m128 tmp6 = _mm_add_ps(_mm_add_ps(tmp2i,tmp3),_mm_add_ps(tmp4,tmp5));

__m128 tmp7 = _mm_add_ps(tmp6,center);

_mm_store_ps(&tmp2,tmp7); //Line 196

}When I compile (icc) the above code without

`#pragma ivdep`

I get the following message: remark: loop was not vectorized: existence of vector dependence.`vector dependence: assumed FLOW dependence between tmp2 line 196 and tmp2 line 196.`

vector dependence: assumed ANTI dependence between tmp2 line 196 and tmp2 line 196.When I compile (icc) it with the

`#pragma ivdep`

, I get the following message:`remark: loop was not vectorized: unsupported data type. //Line 180`

Why is there a dependence suggested for Line 196? How can I eliminate the suggested vector dependence?

Link Copied

2 Replies

Thomas_W_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-09-2012
12:59 PM

60 Views

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-09-2012
05:15 PM

60 Views

For more complete information about compiler optimizations, see our Optimization Notice.