Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
14 Views

Problem with vectorization for Intel Compiler 11

Jump to solution
I have a simple test function as

int test(float* U2,float* U3){
for (int k=0; k<100; k++){
float *u2=U2+k*1000, *u3=U3+k*1000;

#pragma ivdep
#pragma vector aligned
#pragma vector always
for(int i=0; i<1000; i++) u3 = u2;
}
return 0;
}

Compile option as " -vec-report3 -xP -O3 "

Intel compiler 9.* vecotrized above codes. But for Intel compiler 11, it gave message like:

" loop was not vectorized: dereference too complex "

Can anyone point out what I did wrong here.

Thanks,
0 Kudos

Accepted Solutions
Highlighted
Black Belt
14 Views
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.

View solution in original post

0 Kudos
7 Replies
Highlighted
Black Belt
14 Views
You did not state the full version number of the compiler and your OS. On Suse-X64 11.3, with 11.1.073, I find:

[cpp] icc -vec-report3 -xHost -O3 -c vec.cpp 
vec.cpp(2): (col. 4) remark: loop was not vectorized: not inner loop.
vec.cpp(8): (col. 7) remark: LOOP WAS VECTORIZED.
[/cpp]
I do not see any problems with this.
0 Kudos
Highlighted
Beginner
14 Views
Thank you.

The compiler is 11.1, with 64 bit installation

The OS is redhat Linux version 2.6.9-89.0.20.ELsmp

When I run icpc -vec-report3 -c -o -v, the output is like:

Version 11.1
/apps/compilers/intel/Compiler/11.1/056/bin/intel64/mcpcom -_g -mP3OPT_inline_alloca -D__HONOR_STD -D__ICC=1110 -D__INTEL_COMP ILER=1110 -D_MT "-_Asystem(unix)" -D__ELF__ "-_Acpu(x86_64)" "-_Amachine(x86_64)" -D__INTEL_COMPILER_BUILD_DATE=20090827 -D__PTRD IFF_TYPE__=long "-D__SIZE_TYPE__=unsigned long" -D__WCHAR_TYPE__=int "-D__WINT_TYPE__=unsigned int" "-D__INTMAX_TYPE__=long int" "-D__UINTMAX_TYPE__=long unsigned int" -D__LONG_MAX__=9223372036854775807L -D__QMSPP_ -D__OPTIMIZE__ -D__NO_MATH_INLINES -D__NO_S TRING_INLINES -D__GNUG__=3 -D__GNUC__=3 -D__GNUC_MINOR__=4 -D__GNUC_PATCHLEVEL__=6 -D__NO_INLINE__ -D__i686 -D__i686__ -D__pentiu mpro -D__pentiumpro__ -D__pentium4 -D__pentium4__ -D__SSE2__ -D__tune_pentium4__ -D__MMX__ -D__SSE__ -D__LP64__ -D_LP64 -D_GNU_SO URCE=1 -D__DEPRECATED=1 -D__GXX_WEAK__=1 -D__GXX_ABI_VERSION=1002 "-D__USER_LABEL_PREFIX__= " -D__REGISTER_PREFIX__= -D__INTEL_RT TI__ -D__EXCEPTIONS=1 -D__unix__ -D__unix -D__linux__ -D__linux -D__gnu_linux__ -B -Dunix -Dlinux -D__x86_64 -D__x86_64__ -_k -_8 -_l -_a -_b --gnu_version=346 -_W5 --gcc-extern-inline -p --bool -tused -mGLOB_eh_linux -x --mspp --multibyte_chars --bool -mP1O PT_version=11.1-intel64 -mGLOB_diag_enable_disable=E:vec -mGLOB_diag_file=tune.diag -mP1OPT_print_version=FALSE -mP3OPT_use_mspp_ call_convention -mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE "-mGLOB_options_string=-vec-report3 -c -o tune.o -v" -mGLOB_cxx_limited_range=FALSE -mGLOB_as_output_backup_file_name=/tmp/icpcb07sjvas_.s -mIPOPT_activate -mIPOPT_lite -mGLOB_machi ne_model=GLOB_MACHINE_MODEL_EFI2 -mGLOB_extended_instructions=0x8 -mIPOPT_args_in_regs=0 -mPGOPTI_value_profile_use=T -mP2OPT_hlo _level=2 -mP2OPT_hlo -mP2OPT_vec_verbose=3 -mP3OPT_debug_linenum_only -mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF20 -mIPOPT_obj_o utput_file_name=tune.o "-mGLOB_linker_version=2.15.92.0.2 20040927" -mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS -mGLOB_obj_output_fil e=tune.o -mGLOB_source_dialect=GLOB_SOURCE_DIALECT_C_PLUS_PLUS -mP1OPT_source_file_name=../tune.cc ../tune.cc
#include "..." search starts here:
#include <...> search starts here:
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/usr/include/c++/3.4.6
/usr/include/c++/3.4.6/x86_64-redhat-linux
/usr/include/c++/3.4.6/backward
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/apps/compilers/intel/Compiler/11.1/056/include
/usr/local/include
/usr/include
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include

End of search list.
../tune.cc(5): (col. 4) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 7) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 34) remark: loop was not vectorized: dereference too complex.
0 Kudos
Highlighted
Black Belt
15 Views
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.

View solution in original post

0 Kudos
Highlighted
Black Belt
14 Views
Perhaps your pragmas aren't recognized unless they begin in column 1. ivdep is almost certainly required to vectorize if you don't declare the pointers with appropriate restrict syntax.
0 Kudos
Highlighted
Beginner
14 Views
Thank you, all. It is unlikely that I can convince system admin to install latest version yet. I also tried this code on Intel compiler 10.*, vectorization doesn't work either.

I am sure "#pragma" is recogized by the compiler since it would give me dependency message if it did not recognize ivdep.
0 Kudos
Highlighted
Black Belt
14 Views
Yes, I see that it does report dependency when ivdep is omitted, and restrict is not added. However, it still produces an identical call to __intel_fast_memcpy() even while reporting inability to vectorize.
I note that the current default Red Hat gcc-4.4 will vectorize this without pragmas, with the addition of restrict.
0 Kudos
Highlighted
Beginner
14 Views
Sys Op finally give the 073 a try, and it works. Thanks, everyone.
0 Kudos