- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Does line 301 contain a MACRO or template use where nested loops are used?

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

c:\>type tstcase.cpp

void foo()

{

double LocationSecondary[16][1024] ;

double LocationPrim[1024] ; // Init was done earlier

double Delta[8] ; // Init was done earlier

for(int ii= 0 ; ii< 1024; ii++, LocationPrim++)

{

for(int jj = 0 ; jj< 8; jj++, LocationSecondary++ )

{

*LocationSecondary = *LocationPrim + Delta[jj] ;

}

for(int jj = 0 ; jj< 8 ; jj++, LocationSecondary++)

{

*LocationSecondary = *LocationPrim - Delta[jj] ;

}

}

}

c:\>icl -c /Qvec-report3 tstcase.cpp

Intel C++ Intel 64 Compiler XE for applications running on Intel 64, Ve

rsion 12.1.0.233 Build 20110811

Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

tstcase.cpp

tstcase.cpp(7): error: expression must be a modifiable lvalue

for(int ii= 0 ; ii< 1024; ii++, LocationPrim++)

^

tstcase.cpp(9): error: expression must be a modifiable lvalue

for(int jj = 0 ; jj< 8; jj++, LocationSecondary++ )

^

tstcase.cpp(11): error: expression must be a modifiable lvalue

*LocationSecondary = *LocationPrim + Delta[jj] ;

^

tstcase.cpp(13): error: expression must be a modifiable lvalue

for(int jj = 0 ; jj< 8 ; jj++, LocationSecondary++)

^

tstcase.cpp(15): error: expression must be a modifiable lvalue

*LocationSecondary = *LocationPrim - Delta[jj] ;

^

compilation aborted for tstcase.cpp (code 2)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Is not a modifiable lvalue type

typedef double[16][1024] d16x1024;

d16x24 LocationSecondary = new d16x24

...

for(int jj = 0 ; jj< 8; jj++, LocationSecondary++ )

now the above is valid

***

However, LocationSecondary++ advances to the 2nd d16x24 in the array

(and the original pointer is modified)

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Is not a modifiable lvalue type

typedef double[16][1024] d16x1024;

d16x24 LocationSecondary = new d16x24

...

for(int jj = 0 ; jj< 8; jj++, LocationSecondary++ )

now the above is valid

***

However, LocationSecondary++ advances to the 2nd d16x24 in the array

(and the original pointer is modified)

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

[cpp]double LocationSecondary[16][1024]; double LocationPrimary[1024]; double Delta[8]; for (int i = 0; i < 1024; i++) { for (int j = 0; j < 8; j++) { LocationSecondaryWhat do you think Jim?= LocationPrimary+ Delta; } for (int j = 0; j < 8; j++) { LocationSecondary[8 + j] = LocationPrimary- Delta; } } [/cpp]

If what I did is correct, then I would also suggest the following change:

[cpp]double LocationSecondary[16][1024]; double LocationPrim[1024]; double Delta[16]; // copy delta[0-7] to delta[8-15] negated // so that you can use + operator for both for (int i = 0; i < 1024; i++) { for (int j = 0; j < 16; j++) { LocationSecondaryMoreover, loops should be reversed because having large stride is inefficient.= LocationPrim+ Delta; } } [/cpp]

Finally, the answer to Yolanda's question about remarks is that the compiler sometimes employs loop transformation so that loops get split into vectorizable and partially vectorizable code if compiler determines it can do that safely -- that is why you get two remarks for the same loop.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I think it would be more xmm register efficient to use:

[cpp]double LocationSecondary[16][1024]; double LocationPrimary[1024]; double Delta[8]; for (int i = 0; i < 1024; i++) { for (int j = 0; j < 8; j++) { LocationSecondary= LocationPrimary+ Delta; LocationSecondary[8 + j] = LocationPrimary- Delta; } }[/cpp]

On SSE all of Delta could be xmm registerized into 4 registers. On x32 this would leave 4 xmm registers available for scratch. I agree with you that the user should consider

double LocationSecondary[1024][16]; // swap indexes

double LocationPrimary[1024];

double Delta[8];

for (int i = 0; i < 1024; i++) {

for (int j = 0; j < 8; j++) {

LocationSecondary

= LocationPrimary

*+ Delta*;

LocationSecondary

LocationSecondary

*[8 + j] = LocationPrimary*

*- Delta*;

}

}

Provided that swap of index does not introduce performance penalty elsewhere.

** run a test to confirm performance change **

Jim Dempsey }

}

Provided that swap of index does not introduce performance penalty elsewhere.

** run a test to confirm performance change **

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This is a typical loop that would benefit from 3 operand syntax (since LocationPrimary could be reused without copying), and from AVX.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Pentium D excludes use of architecture options more recent than SSE3.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page