Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Problem with vectorization for Intel Compiler 11

gbncn
Beginner
511 Views
I have a simple test function as

int test(float* U2,float* U3){
for (int k=0; k<100; k++){
float *u2=U2+k*1000, *u3=U3+k*1000;

#pragma ivdep
#pragma vector aligned
#pragma vector always
for(int i=0; i<1000; i++) u3 = u2;
}
return 0;
}

Compile option as " -vec-report3 -xP -O3 "

Intel compiler 9.* vecotrized above codes. But for Intel compiler 11, it gave message like:

" loop was not vectorized: dereference too complex "

Can anyone point out what I did wrong here.

Thanks,
0 Kudos
1 Solution
mecej4
Honored Contributor III
511 Views
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.

View solution in original post

0 Kudos
7 Replies
mecej4
Honored Contributor III
511 Views
You did not state the full version number of the compiler and your OS. On Suse-X64 11.3, with 11.1.073, I find:

[cpp] icc -vec-report3 -xHost -O3 -c vec.cpp 
vec.cpp(2): (col. 4) remark: loop was not vectorized: not inner loop.
vec.cpp(8): (col. 7) remark: LOOP WAS VECTORIZED.
[/cpp]
I do not see any problems with this.
0 Kudos
gbncn
Beginner
511 Views
Thank you.

The compiler is 11.1, with 64 bit installation

The OS is redhat Linux version 2.6.9-89.0.20.ELsmp

When I run icpc -vec-report3 -c -o -v, the output is like:

Version 11.1
/apps/compilers/intel/Compiler/11.1/056/bin/intel64/mcpcom -_g -mP3OPT_inline_alloca -D__HONOR_STD -D__ICC=1110 -D__INTEL_COMP ILER=1110 -D_MT "-_Asystem(unix)" -D__ELF__ "-_Acpu(x86_64)" "-_Amachine(x86_64)" -D__INTEL_COMPILER_BUILD_DATE=20090827 -D__PTRD IFF_TYPE__=long "-D__SIZE_TYPE__=unsigned long" -D__WCHAR_TYPE__=int "-D__WINT_TYPE__=unsigned int" "-D__INTMAX_TYPE__=long int" "-D__UINTMAX_TYPE__=long unsigned int" -D__LONG_MAX__=9223372036854775807L -D__QMSPP_ -D__OPTIMIZE__ -D__NO_MATH_INLINES -D__NO_S TRING_INLINES -D__GNUG__=3 -D__GNUC__=3 -D__GNUC_MINOR__=4 -D__GNUC_PATCHLEVEL__=6 -D__NO_INLINE__ -D__i686 -D__i686__ -D__pentiu mpro -D__pentiumpro__ -D__pentium4 -D__pentium4__ -D__SSE2__ -D__tune_pentium4__ -D__MMX__ -D__SSE__ -D__LP64__ -D_LP64 -D_GNU_SO URCE=1 -D__DEPRECATED=1 -D__GXX_WEAK__=1 -D__GXX_ABI_VERSION=1002 "-D__USER_LABEL_PREFIX__= " -D__REGISTER_PREFIX__= -D__INTEL_RT TI__ -D__EXCEPTIONS=1 -D__unix__ -D__unix -D__linux__ -D__linux -D__gnu_linux__ -B -Dunix -Dlinux -D__x86_64 -D__x86_64__ -_k -_8 -_l -_a -_b --gnu_version=346 -_W5 --gcc-extern-inline -p --bool -tused -mGLOB_eh_linux -x --mspp --multibyte_chars --bool -mP1O PT_version=11.1-intel64 -mGLOB_diag_enable_disable=E:vec -mGLOB_diag_file=tune.diag -mP1OPT_print_version=FALSE -mP3OPT_use_mspp_ call_convention -mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE "-mGLOB_options_string=-vec-report3 -c -o tune.o -v" -mGLOB_cxx_limited_range=FALSE -mGLOB_as_output_backup_file_name=/tmp/icpcb07sjvas_.s -mIPOPT_activate -mIPOPT_lite -mGLOB_machi ne_model=GLOB_MACHINE_MODEL_EFI2 -mGLOB_extended_instructions=0x8 -mIPOPT_args_in_regs=0 -mPGOPTI_value_profile_use=T -mP2OPT_hlo _level=2 -mP2OPT_hlo -mP2OPT_vec_verbose=3 -mP3OPT_debug_linenum_only -mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF20 -mIPOPT_obj_o utput_file_name=tune.o "-mGLOB_linker_version=2.15.92.0.2 20040927" -mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS -mGLOB_obj_output_fil e=tune.o -mGLOB_source_dialect=GLOB_SOURCE_DIALECT_C_PLUS_PLUS -mP1OPT_source_file_name=../tune.cc ../tune.cc
#include "..." search starts here:
#include <...> search starts here:
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/usr/include/c++/3.4.6
/usr/include/c++/3.4.6/x86_64-redhat-linux
/usr/include/c++/3.4.6/backward
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/apps/compilers/intel/Compiler/11.1/056/include
/usr/local/include
/usr/include
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include

End of search list.
../tune.cc(5): (col. 4) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 7) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 34) remark: loop was not vectorized: dereference too complex.
0 Kudos
mecej4
Honored Contributor III
512 Views
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.
0 Kudos
TimP
Honored Contributor III
511 Views
Perhaps your pragmas aren't recognized unless they begin in column 1. ivdep is almost certainly required to vectorize if you don't declare the pointers with appropriate restrict syntax.
0 Kudos
gbncn
Beginner
510 Views
Thank you, all. It is unlikely that I can convince system admin to install latest version yet. I also tried this code on Intel compiler 10.*, vectorization doesn't work either.

I am sure "#pragma" is recogized by the compiler since it would give me dependency message if it did not recognize ivdep.
0 Kudos
TimP
Honored Contributor III
510 Views
Yes, I see that it does report dependency when ivdep is omitted, and restrict is not added. However, it still produces an identical call to __intel_fast_memcpy() even while reporting inability to vectorize.
I note that the current default Red Hat gcc-4.4 will vectorize this without pragmas, with the addition of restrict.
0 Kudos
gbncn
Beginner
510 Views
Sys Op finally give the 073 a try, and it works. Thanks, everyone.
0 Kudos
Reply