Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1093 Discussions

Question about example on Optimization manual---AVX mask move to avoid branch penalty

Deyang_Gu
Beginner
1,535 Views

Hi all,

I am trying to run an example introduced by optimization manual(June 2013) on page 11-23, example 11-14. I tried to use a separate .s file to write the function, and a main.c file to do the main func. The code will only run correctly in debug mode. Please see attachment for my code. The cond_loop.c is actually cond_loop.s but the forum won't accept this kind of extension.  

  • icc main-2.c cond_loop.s -g          Everything works fine. 
  • icc main-2.c cond_loop.s              Segmentation Fault with failure to access array members at the end of the code.

After the function void cond_loop(const float *a, float *b, const float *c, const float *d, const float *e, const int length) returns, all the array pointers will be lost so I cannot access the old arrays anymore. This problem will only occur without -g compile option, meaning release code only bug. So I am not able to debug it. I did some research and it showed this is because in debug mode stack frame pointer will always be saved but in release mode this is not the case. I am not sure this is my problem and I don't really know how to solve the problem. I tried to push rbp and rsp but these won't help. Would anyone please help me look at it? Any advice is appreciated. Thank you all!

BTW: in attachment, cond_loop_c.c is the corresponding C version of the assembly and of course, this one works perfectly. And I am using Linux so it is X64 system V ABI. Thanks again.

Best

xiangpisai

0 Kudos
31 Replies
SergeyKostrov
Valued Contributor II
462 Views
... void cond_loop( const float *a, float *b, const float *c, const float *d, const float *e, const int length ); ... void cond_loop_c( float *a, float *b, float *c, float *d, float *e, int length ) { ... } ... There is a difference in forward declaration of function and declaration of the function in implementation. That is, const specificator is used for most parameters except for b. Please post a right declaration for the function. Thanks.
0 Kudos
Bernard
Valued Contributor I
462 Views

Actually it seems that there are two different functions: void cond_loop() and void cond_loop_c().

0 Kudos
SergeyKostrov
Valued Contributor II
462 Views
>>...there are two different functions: void cond_loop() and void cond_loop_c()... xiangpisai made a note that ...in attachment, cond_loop_c.c is the corresponding C version of the assembly...
0 Kudos
Deyang_Gu
Beginner
462 Views

Sergey Kostrov wrote:

>>...there are two different functions: void cond_loop() and void cond_loop_c()...

xiangpisai made a note that

...in attachment, cond_loop_c.c is the corresponding C version of the assembly...

Hmm...Might be my bad English. Actually I don't have a good idea what is const in assembly. In my assembly I didn't perform any check for that const. So I cannot tell the difference that in the C version of the code, whether using const or not will make any difference. 

0 Kudos
SergeyKostrov
Valued Contributor II
462 Views
Try to do a verification that GCC compiler generates identical assembler codes for these two versions of the cond_loop function: [ Version 1 ] void cond_loop( const float *a, const float *b, const float *c, const float *d, const float *e, const int length ); void cond_loop( const float *a, const float *b, const float *c, const float *d, const float *e, const int length ) { int i; for(i=0;i<0) b = c*d; else b = c * e; } } [ Version 2 ] void cond_loop( float *a, float *b, float *c, float *d, float *e, int length ); void cond_loop( float *a, float *b, float *c, float *d, float *e, int length ) { int i; for(i=0;i<0) b = c*d; else b = c * e; } }
0 Kudos
SergeyKostrov
Valued Contributor II
462 Views
Xiangpisai, Here is a modified test case and I'd like to inform you that I didn't have any issues or problems on a 64-bit Windows 7 Professional platform on a Dell Precision Mobile M4700 with Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 ). I've completed 4 tests and here Intel C++ compiler options: icl.exe /QxAVX /Od /MDd /D"_DEBUG" Test18.cpp icl.exe /QxAVX /O1 /MD /D"NODEBUG" Test18.cpp icl.exe /QxAVX /O2 /MD /D"NODEBUG" Test18.cpp icl.exe /QxAVX /O3 /MD /D"NODEBUG" Test18.cpp I hope that my resulst will be useful for you.
0 Kudos
SergeyKostrov
Valued Contributor II
462 Views
[ Outputs ] Test 1 - icl.exe /QxAVX /Od /MDd /D"_DEBUG" Test18.cpp a[14] = 0.849280 c[14] = 0.657335 d[14] = 0.189015 e[14] = 0.729507 f[14] = 0.479530 Test 2 - icl.exe /QxAVX /O1 /MD /D"NODEBUG" Test18.cpp a[14] = 0.849280 c[14] = 0.657335 d[14] = 0.189015 e[14] = 0.729507 f[14] = 0.479530 Test 3 - icl.exe /QxAVX /O2 /MD /D"NODEBUG" Test18.cpp a[14] = 0.849280 c[14] = 0.657335 d[14] = 0.189015 e[14] = 0.729507 f[14] = 0.479530 Test 4 - icl.exe /QxAVX /O3 /MD /D"NODEBUG" Test18.cpp a[14] = 0.849280 c[14] = 0.657335 d[14] = 0.189015 e[14] = 0.729507 f[14] = 0.479530 Note: Since a Seed is the same for all 4 tests results are identical.
0 Kudos
Deyang_Gu
Beginner
462 Views

Sergey Kostrov wrote:

[ Outputs ]

Test 1 - icl.exe /QxAVX /Od /MDd /D"_DEBUG" Test18.cpp

a[14] = 0.849280
c[14] = 0.657335
d[14] = 0.189015
e[14] = 0.729507
f[14] = 0.479530

Test 2 - icl.exe /QxAVX /O1 /MD /D"NODEBUG" Test18.cpp

a[14] = 0.849280
c[14] = 0.657335
d[14] = 0.189015
e[14] = 0.729507
f[14] = 0.479530

Test 3 - icl.exe /QxAVX /O2 /MD /D"NODEBUG" Test18.cpp

a[14] = 0.849280
c[14] = 0.657335
d[14] = 0.189015
e[14] = 0.729507
f[14] = 0.479530

Test 4 - icl.exe /QxAVX /O3 /MD /D"NODEBUG" Test18.cpp

a[14] = 0.849280
c[14] = 0.657335
d[14] = 0.189015
e[14] = 0.729507
f[14] = 0.479530

Note: Since a Seed is the same for all 4 tests results are identical.

Thanks for helping me test the code. I am also using a Dell Precision M4700. the only thing different is that I am using Linux---Debian Wheezy. I guess that's the main point why I am getting bugs. Anyway, after pushing corresponding registers to stack, things are working fine now.

0 Kudos
SergeyKostrov
Valued Contributor II
462 Views
>>... I am also using a Dell Precision M4700. the only thing different is that I am using Linux---Debian Wheezy. I guess that's >>the main point why I am getting bugs. Anyway, after pushing corresponding registers to stack, things are working fine now... Thanks for confirming that the problem is resolved.
0 Kudos
levicki
Valued Contributor I
462 Views

Just wanted to point you to a great optimization resource:

http://agner.org/optimize/#manuals

Particulary this manual:
http://agner.org/optimize/optimizing_assembly.pdf

It explains Windows and Linux 32-bit and 64-bit ABI, calling conventions, name mangling, etc.

0 Kudos
jimdempseyatthecove
Honored Contributor III
462 Views

Igor,

Thanks for the link. I've enjoyed Agner Fog's posts and web pages for many years and have considered him a valueable programming resource.

Jim Dempsey

0 Kudos
Reply