- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a simple test function as
int test(float* U2,float* U3){
for (int k=0; k<100; k++){
float *u2=U2+k*1000, *u3=U3+k*1000;
#pragma ivdep
#pragma vector aligned
#pragma vector always
for(int i=0; i<1000; i++) u3 = u2;
}
return 0;
}
Compile option as " -vec-report3 -xP -O3 "
Intel compiler 9.* vecotrized above codes. But for Intel compiler 11, it gave message like:
" loop was not vectorized: dereference too complex "
Can anyone point out what I did wrong here.
Thanks,
int test(float* U2,float* U3){
for (int k=0; k<100; k++){
float *u2=U2+k*1000, *u3=U3+k*1000;
#pragma ivdep
#pragma vector aligned
#pragma vector always
for(int i=0; i<1000; i++) u3 = u2;
}
return 0;
}
Compile option as " -vec-report3 -xP -O3 "
Intel compiler 9.* vecotrized above codes. But for Intel compiler 11, it gave message like:
" loop was not vectorized: dereference too complex "
Can anyone point out what I did wrong here.
Thanks,
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You did not state the full version number of the compiler and your OS. On Suse-X64 11.3, with 11.1.073, I find:
[cpp] icc -vec-report3 -xHost -O3 -c vec.cpp vec.cpp(2): (col. 4) remark: loop was not vectorized: not inner loop. vec.cpp(8): (col. 7) remark: LOOP WAS VECTORIZED. [/cpp]I do not see any problems with this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
The compiler is 11.1, with 64 bit installation
The OS is redhat Linux version 2.6.9-89.0.20.ELsmp
When I run icpc -vec-report3 -c -o -v, the output is like:
Version 11.1
/apps/compilers/intel/Compiler/11.1/056/bin/intel64/mcpcom -_g -mP3OPT_inline_alloca -D__HONOR_STD -D__ICC=1110 -D__INTEL_COMP ILER=1110 -D_MT "-_Asystem(unix)" -D__ELF__ "-_Acpu(x86_64)" "-_Amachine(x86_64)" -D__INTEL_COMPILER_BUILD_DATE=20090827 -D__PTRD IFF_TYPE__=long "-D__SIZE_TYPE__=unsigned long" -D__WCHAR_TYPE__=int "-D__WINT_TYPE__=unsigned int" "-D__INTMAX_TYPE__=long int" "-D__UINTMAX_TYPE__=long unsigned int" -D__LONG_MAX__=9223372036854775807L -D__QMSPP_ -D__OPTIMIZE__ -D__NO_MATH_INLINES -D__NO_S TRING_INLINES -D__GNUG__=3 -D__GNUC__=3 -D__GNUC_MINOR__=4 -D__GNUC_PATCHLEVEL__=6 -D__NO_INLINE__ -D__i686 -D__i686__ -D__pentiu mpro -D__pentiumpro__ -D__pentium4 -D__pentium4__ -D__SSE2__ -D__tune_pentium4__ -D__MMX__ -D__SSE__ -D__LP64__ -D_LP64 -D_GNU_SO URCE=1 -D__DEPRECATED=1 -D__GXX_WEAK__=1 -D__GXX_ABI_VERSION=1002 "-D__USER_LABEL_PREFIX__= " -D__REGISTER_PREFIX__= -D__INTEL_RT TI__ -D__EXCEPTIONS=1 -D__unix__ -D__unix -D__linux__ -D__linux -D__gnu_linux__ -B -Dunix -Dlinux -D__x86_64 -D__x86_64__ -_k -_8 -_l -_a -_b --gnu_version=346 -_W5 --gcc-extern-inline -p --bool -tused -mGLOB_eh_linux -x --mspp --multibyte_chars --bool -mP1O PT_version=11.1-intel64 -mGLOB_diag_enable_disable=E:vec -mGLOB_diag_file=tune.diag -mP1OPT_print_version=FALSE -mP3OPT_use_mspp_ call_convention -mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE "-mGLOB_options_string=-vec-report3 -c -o tune.o -v" -mGLOB_cxx_limited_range=FALSE -mGLOB_as_output_backup_file_name=/tmp/icpcb07sjvas_.s -mIPOPT_activate -mIPOPT_lite -mGLOB_machi ne_model=GLOB_MACHINE_MODEL_EFI2 -mGLOB_extended_instructions=0x8 -mIPOPT_args_in_regs=0 -mPGOPTI_value_profile_use=T -mP2OPT_hlo _level=2 -mP2OPT_hlo -mP2OPT_vec_verbose=3 -mP3OPT_debug_linenum_only -mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF20 -mIPOPT_obj_o utput_file_name=tune.o "-mGLOB_linker_version=2.15.92.0.2 20040927" -mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS -mGLOB_obj_output_fil e=tune.o -mGLOB_source_dialect=GLOB_SOURCE_DIALECT_C_PLUS_PLUS -mP1OPT_source_file_name=../tune.cc ../tune.cc
#include "..." search starts here:
#include <...> search starts here:
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/usr/include/c++/3.4.6
/usr/include/c++/3.4.6/x86_64-redhat-linux
/usr/include/c++/3.4.6/backward
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/apps/compilers/intel/Compiler/11.1/056/include
/usr/local/include
/usr/include
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include
End of search list.
../tune.cc(5): (col. 4) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 7) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 34) remark: loop was not vectorized: dereference too complex.
The compiler is 11.1, with 64 bit installation
The OS is redhat Linux version 2.6.9-89.0.20.ELsmp
When I run icpc -vec-report3 -c -o -v, the output is like:
Version 11.1
/apps/compilers/intel/Compiler/11.1/056/bin/intel64/mcpcom -_g -mP3OPT_inline_alloca -D__HONOR_STD -D__ICC=1110 -D__INTEL_COMP ILER=1110 -D_MT "-_Asystem(unix)" -D__ELF__ "-_Acpu(x86_64)" "-_Amachine(x86_64)" -D__INTEL_COMPILER_BUILD_DATE=20090827 -D__PTRD IFF_TYPE__=long "-D__SIZE_TYPE__=unsigned long" -D__WCHAR_TYPE__=int "-D__WINT_TYPE__=unsigned int" "-D__INTMAX_TYPE__=long int" "-D__UINTMAX_TYPE__=long unsigned int" -D__LONG_MAX__=9223372036854775807L -D__QMSPP_ -D__OPTIMIZE__ -D__NO_MATH_INLINES -D__NO_S TRING_INLINES -D__GNUG__=3 -D__GNUC__=3 -D__GNUC_MINOR__=4 -D__GNUC_PATCHLEVEL__=6 -D__NO_INLINE__ -D__i686 -D__i686__ -D__pentiu mpro -D__pentiumpro__ -D__pentium4 -D__pentium4__ -D__SSE2__ -D__tune_pentium4__ -D__MMX__ -D__SSE__ -D__LP64__ -D_LP64 -D_GNU_SO URCE=1 -D__DEPRECATED=1 -D__GXX_WEAK__=1 -D__GXX_ABI_VERSION=1002 "-D__USER_LABEL_PREFIX__= " -D__REGISTER_PREFIX__= -D__INTEL_RT TI__ -D__EXCEPTIONS=1 -D__unix__ -D__unix -D__linux__ -D__linux -D__gnu_linux__ -B -Dunix -Dlinux -D__x86_64 -D__x86_64__ -_k -_8 -_l -_a -_b --gnu_version=346 -_W5 --gcc-extern-inline -p --bool -tused -mGLOB_eh_linux -x --mspp --multibyte_chars --bool -mP1O PT_version=11.1-intel64 -mGLOB_diag_enable_disable=E:vec -mGLOB_diag_file=tune.diag -mP1OPT_print_version=FALSE -mP3OPT_use_mspp_ call_convention -mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE "-mGLOB_options_string=-vec-report3 -c -o tune.o -v" -mGLOB_cxx_limited_range=FALSE -mGLOB_as_output_backup_file_name=/tmp/icpcb07sjvas_.s -mIPOPT_activate -mIPOPT_lite -mGLOB_machi ne_model=GLOB_MACHINE_MODEL_EFI2 -mGLOB_extended_instructions=0x8 -mIPOPT_args_in_regs=0 -mPGOPTI_value_profile_use=T -mP2OPT_hlo _level=2 -mP2OPT_hlo -mP2OPT_vec_verbose=3 -mP3OPT_debug_linenum_only -mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF20 -mIPOPT_obj_o utput_file_name=tune.o "-mGLOB_linker_version=2.15.92.0.2 20040927" -mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS -mGLOB_obj_output_fil e=tune.o -mGLOB_source_dialect=GLOB_SOURCE_DIALECT_C_PLUS_PLUS -mP1OPT_source_file_name=../tune.cc ../tune.cc
#include "..." search starts here:
#include <...> search starts here:
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/usr/include/c++/3.4.6
/usr/include/c++/3.4.6/x86_64-redhat-linux
/usr/include/c++/3.4.6/backward
/apps/compilers/intel/Compiler/11.1/056/include/intel64
/apps/compilers/intel/Compiler/11.1/056/include
/usr/local/include
/usr/include
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include
End of search list.
../tune.cc(5): (col. 4) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 7) remark: loop was not vectorized: not inner loop.
../tune.cc(11): (col. 34) remark: loop was not vectorized: dereference too complex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are running 11.1.056. If you are eligible for the update released in August (11.1.073), installing it may be the simplest solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps your pragmas aren't recognized unless they begin in column 1. ivdep is almost certainly required to vectorize if you don't declare the pointers with appropriate restrict syntax.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, all. It is unlikely that I can convince system admin to install latest version yet. I also tried this code on Intel compiler 10.*, vectorization doesn't work either.
I am sure "#pragma" is recogized by the compiler since it would give me dependency message if it did not recognize ivdep.
I am sure "#pragma" is recogized by the compiler since it would give me dependency message if it did not recognize ivdep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I see that it does report dependency when ivdep is omitted, and restrict is not added. However, it still produces an identical call to __intel_fast_memcpy() even while reporting inability to vectorize.
I note that the current default Red Hat gcc-4.4 will vectorize this without pragmas, with the addition of restrict.
I note that the current default Red Hat gcc-4.4 will vectorize this without pragmas, with the addition of restrict.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sys Op finally give the 073 a try, and it works. Thanks, everyone.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page