Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28532 Discussions

Visual Fortran An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size.

zananoga
Beginner
927 Views

Recieved the following error message:

Visual Fortran An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size.

This error is preventing us from implementing a parallel version. We are not using the optimizations and automatic parallelization options for the project. We are using openmp directives, only !$omp parallel do 's. We are working on a quad core HP machine with Vista. Using Microsoft Visual Studio 2005. The program we are working on is very long(thousands of lines). Ran in parallel using gfortran on a Linux box. Ran serial version, took 18 hours on Vista. Attempted to run parallel version, did not run in parallel. Thought maybe we should break program up into subroutines, but this will be a lot of work and possibly increase run time making parallelization less helpful.

Any ideas?

0 Kudos
6 Replies
TimP
Honored Contributor III
927 Views
Making the program more modular, with subroutines, usually is beneficial to performance, as long as the inner DO loops are inside the subroutines. If the code is structured so as to require the compiler to swap loops, more levels of loops should be in the subroutines, but gfortran isn't likely to perform such optimization.
You might consider removing some optimizations explicitly, such as -Qip- to avoid in-lining, or -O1 to avoid vectorization, if you don't require those optimizations, in case that avoids the optimization limits. Knowing which options you used with gfortran, you ought to be able set equivalent options for ifort; ifort defaults are roughly equivalent to gfortran -O3 -ffast-math -funroll-loops, if you used current gfortran.
Even at the default level of opt-report and vecreport, you get an indication which loops are vectorized and parallelized, so you should be able to see whether you get the desired optimizations.
The option -override_limits may work when you don't leave on too many aggressive optimizations.
0 Kudos
jimdempseyatthecove
Honored Contributor III
927 Views

>>This error is preventing us from implementing a parallel version.

This message is not an error message, it is an information message.

The text of the information message, while correct, is missleading

"...loops may not be vectorized or parallelized."

should read

"...loops may not be vectorized or auto-parallelized."

This does not prevent you from using OpenMP to parallize this routine.

In fact, it is likely that you would not want auto-parallization code generated in an OpenMP parallized application as this would create additional threads that are not OpenMP threads and potentially adversely effect the performance of the application.

If your code with OpenMP directives is not running in parallel then you are compiling without OpenMP enabled, or you have restricted the number of OpenMP threads to 1.

For project properties start with

Configuration Properties
Fortran
Parallization: NO (this is auto-parallization)
Preprocessor
OpenMP Conditional Compilation: YES
Language
Process OpenMP Directives: Generate Parallel Code (/Qopenmp)

Also make sure that code (Program, subroutine, function) that use !$OMP directives also have USE OMP_LIB

Jim Dempsey

0 Kudos
zananoga
Beginner
927 Views

Jim,

thanks for the suggestions. I have the project properties as you state. It seems as if the serial version and the openMP version take just as long as each other, about 18hrs. We plan to time some of the individual parallel do loops to see how long they take for 1,2, 3to 4 processors to try to figure out what is happening.

0 Kudos
zananoga
Beginner
927 Views

Tim,

I inserted /override_limits as additional options for command lines for both FORTRAN & LINK:

********************************************
PROJECT PROPERTIES as of 8/18/2008
********************************************
FORTRAN Command Line:
/nologo /Zi /Od /fpp /Qopenmp /fpe:0 /module:"Debug" /object:"Debug" /traceback /check:bounds /libs:static /threads /dbglibs /c
Additional options: /override_limits
*******
LINKER Command options:
/OUT:"DebugCMSNEWOMP.exe" /INCREMENTAL:NO /NOLOGO /NODEFAULTLIB:"libc.lib" /MANIFEST

/MANIFESTFILE:"C:DATACMSNEWdebugcmsnewomp.exe.intermediate.manifest" /DEBUG /PDB:"C:DATACMSNEWdebugcmsnewomp.pdb" /SUBSYSTEM:CONSOLE /STACK:16777216

/IMPLIB:"C:DATACMSNEWdebugcmsnewomp.lib" hdf5dll.lib hdf5.lib xmdffortran.lib
*********
Libraries:
Debug Multithreaded (/libs:static /threads /dbglibs)

RESOURCES CMD LINE:
/fo "Debug/CMS-NEW.res"

MIDL CMD LINE:
/nologo /char signed /env win32 /h "CMS-NEW_h.h" /tlb "Debug/CMS-NEW.tlb"

****************************************************************************************
RESULT OF BUILD WHEN: '/override_limits' option NOT in fortran cmd line. ONLY in linker cmd line.
****************************************************************************************
1>------ Rebuild All started: Project: CMS-NEW, Configuration: Debug Win32 ------
1>Deleting intermediate files and output files for project 'CMS-NEW', configuration 'Debug|Win32'.
1>Compiling with Intel Fortran Compiler 10.1.024 [IA-32]...
1>lund_cirp_080104.for
1>Xmdff.f90
1>CMSfast.for
1>C:DATACMSNEWCMSfast.for(2): (col. 10) remark: An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine

size.
1>Linking...
1>LINK : warning LNK4044: unrecognized option '/override_limits'; ignored
1>Embedding manifest...
1>
1>Build log written to "file://C:DATACMSNEWDebugBuildLog.htm"
1>CMS-NEW - 0 error(s), 1 warning(s)
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
****************************************************************************************
RESULT OF BUILD WHEN: '/override_limits' option in BOTH linker & Fortran cmd line.
****************************************************************************************
1>------ Build started: Project: CMS-NEW, Configuration: Debug Win32 ------
1>Compiling with Intel Fortran Compiler 10.1.024 [IA-32]...
1>lund_cirp_080104.for
1>ifort: command line warning #10158: ignoring option '/o'; argument must be separate
1>Xmdff.f90
1>ifort: command line warning #10158: ignoring option '/o'; argument must be separate
1>CMSfast.for
1>ifort: command line warning #10158: ignoring option '/o'; argument must be separate
1>C:DATACMSNEWCMSfast.for(2): (col. 10) remark: An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size.
1>Linking...
1>LINK : warning LNK4044: unrecognized option '/override_limits'; ignored
1>Embedding manifest...
1>
1>Build log written to "file://C:DATACMSNEWDebugBuildLog.htm"
1>CMS-NEW - 0 error(s), 4 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

0 Kudos
Steven_L_Intel1
Employee
927 Views
/override_limits is recognized by the compiler only, not the linker. It is not documented and not recommended for general use. Feel free to try it, but you may run into other problems if this switch is used.
0 Kudos
jimdempseyatthecove
Honored Contributor III
927 Views

Zananoga,

If you do not have the Intel Thread Profiler you can download a demo version.

.OR.

You can go to AMD's website and poke around for Code Analyst (it's free). Although Code Analyst is intended for use on AMD processors it is fully capable of performing timer based profiling on Intel processors.

Each profiler is capable of performing event based profiling on there respective processors.

Runnind your application in a profiling toolcan give you a quick look at not only the hot spots, but also which processor(s) ends up doing the work on the hot spot. If you see only 1 CPU active in the profiler then you do not have parallel programming in effect.

Jim Dempsey

0 Kudos
Reply