Not getting log() or pow() to vectorize

MHlav · ‎07-30-2013

Consider the following code:

double* a;
size_t n;
a[0:n] = log(a[0:n]);

The compiler reports that _log cannot be vectorized. Reports the same for the pow() function. However, changing to functions such as exp(), sin(), etc. allow vectorization. I thought that log() and pow() were vectorizable functions as in http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-E98D4E0A-9730-425D-A898-3BB4AB9B2330.htm. Does anyone know the cause?

Thanks.

SergeyKostrov · ‎07-30-2013

I didn't verify your test case yet but could you try a workaround like: - Make a for-loop ( from 0 to n-1 ) with unrolling 4-in-1 - Use 4 temporary variables to calculate 4 log-values - Store 4 calculated log-values in an output array - Repeat until all output array is filled out Please use -vec-report:3 option to see why there is no vectorization of your processing with log function.

MHlav · ‎07-30-2013

I can try those workarounds, but I don't know what is different about log() and pow(). This is the output from level 6:

vectorization support: call to function _log cannot be vectorized.

Same occurs with pow(), with the statement referencing the _pow function.

TimP · ‎07-30-2013

Your example would require #include <math.h> and possibly a change from size_t to int.

MHlav · ‎07-30-2013

Actually I have #include <mathimf.h> in the file. Isn't that what is required? Why int? It's the same size on 32 bit builds and I believe technically wrong for 64 bit builds.

Again, these would not explain why other math functions work.

SergeyKostrov · ‎07-30-2013

>>...Again, these would not explain why other math functions work. Please post a complete reproducer with all include files and a list of command options you've used.

SergeyKostrov · ‎07-30-2013

>>... The compiler reports that _log cannot be vectorized. Is that a macro or a C-like function? Use debugger to verify.

MHlav · ‎07-31-2013

I figured out what is causing it not to be vectorized. The use of /fp:precise. However, I don't understand why that switch will affect only certain functions while others can be vectorized.

SergeyKostrov · ‎07-31-2013

>>...However, I don't understand why that switch will affect only certain functions while others can be vectorized... It could be by design of the compiler. Please review the following topics: Programming with Auto-parallelization http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-22C9A59B-EFE5-47F4-ACA2-7BA6D2DD16DD.htm#GUID-22C9A59B-EFE5-47F4-ACA2-7BA6D2DD16DD Programming Guidelines for Vectorization http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-D284C1EE-BFA4-4EA3-BB67-4A3E5D50199F.htm#GUID-D284C1EE-BFA4-4EA3-BB67-4A3E5D50199F Vectorization and Loops http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-E98D4E0A-9730-425D-A898-3BB4AB9B2330.htm and there are some restrictions on vectorization and parallelization of codes.

TimP · ‎07-31-2013

Michael Hlavinka wrote:

I figured out what is causing it not to be vectorized. The use of /fp:precise. However, I don't understand why that switch will affect only certain functions while others can be vectorized.

So you can see why everyone has been asking for a reproducer.

In recent compilers, increasing numbers of svml function invocations are disabled by /fp:precise. That some of them slipped by in the past may have been an oversight. svml functions aren't designed to permit capturing exceptions on individual operands. If you wish to over-rule this effect on math function vectorization, you may set /Qfast-transcendentals.

In principle, you may also need to consider the /Qimf- options. The svml default "guarantees" accuracy only within 4 Ulps (although it is usually better), which is not consistent with expectation for /fp:precise. exp() and pow() functions (and their relatives) are notoriously difficult to vectorize while maintaining full accuracy for corner cases. /Qimf-... allows you to request higher precision/slower or lower/faster functions if they exist.

I noticed a case this week where disabling svml vectorization by -fp-model source doesn't affect vec-report. Apparently, the decision not to report the difference between full vectorization with and partial vectorization without /Qcomplex-limited-range has been carried over to /Qfast-transcendentals.

I had to revise my recommendation for options to observe parentheses while allowing maximum optimization to include fast-transcendentals:

/fp:source /Qftz /Qfast-transcendentals [/Qprec-div- /Qprec-sqrt-]

This still disables vectorization of sum and indexed max/min reductions.

MHlav · ‎07-31-2013

Thanks for the information everyone. Do you still want a repro case as all I did was extract this from a much larger program? My repro case really doesn't do anything more than here.

Tim, do you know the accuracy of the VC++ library in /fp:precise and /fp:fast mode? Since part of my application is compiled with it, I suspect I may need similar accuracies for the various modules.

TimP · ‎07-31-2013

Microsoft /fp:fast vs. /fp:precise don't affect their math libraries, as far as I know. Most of them, particularly if based on x87 code, should be what Intel calls "high" accuracy. I don't believe there are any vector math functions in the Microsoft libraries. If it's critical, you may want /Qimf-precision:high versions for Intel vector libraries (high accuracy is the default for the scalar functions). Although ICL /fp:source is roughly equivalent to Microsoft /fp:fast, the more aggressive ICL default /fp:fast affects math function accuracy only when it promotes vectorization and imf-precision is set to medium (default) or low (where double "low" is barely better than float high precision).

By the way, /Qimf-precision also affects vectorized divide and sqrt, but /Qprec-div /Qprec-sqrt will force those independently to full accuracy.

SergeyKostrov · ‎07-31-2013

>>...do you know the accuracy of the VC++ library in /fp:precise and /fp:fast mode? Since part of my application is compiled with it, >>I suspect I may need similar accuracies for the various modules... It is very easy to verify as follows ( or in a similar way ): ... // Sub-Test 5.1 - Calculates Product of 0.1 * 0.1 - RTfloat // { CrtPrintf( RTU("Sub-Test 5.1 - RTfloat\n") ); RTfloat fVal = 0.1f; RTfloat fRes = 0.0f; uiControlWordx87 = CrtControl87( _RTFPU_PC_24, _RTFPU_MCW_PC ); fRes = fVal * fVal; CrtPrintf( RTU("24-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes ); uiControlWordx87 = CrtControl87( _RTFPU_PC_53, _RTFPU_MCW_PC ); fRes = fVal * fVal; CrtPrintf( RTU("53-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes ); uiControlWordx87 = CrtControl87( _RTFPU_PC_64, _RTFPU_MCW_PC ); fRes = fVal * fVal; CrtPrintf( RTU("64-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes ); uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC ); fRes = fVal * fVal; CrtPrintf( RTU("Default : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes ); } // Sub-Test 5.2 - Calculates Product of 0.1 * 0.1 - RTdouble // { CrtPrintf( RTU("Sub-Test 5.2 - RTdouble\n") ); RTdouble dVal = 0.1L; RTdouble dRes = 0.0L; uiControlWordx87 = CrtControl87( _RTFPU_PC_24, _RTFPU_MCW_PC ); dRes = dVal * dVal; CrtPrintf( RTU("24-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes ); uiControlWordx87 = CrtControl87( _RTFPU_PC_53, _RTFPU_MCW_PC ); dRes = dVal * dVal; CrtPrintf( RTU("53-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes ); uiControlWordx87 = CrtControl87( _RTFPU_PC_64, _RTFPU_MCW_PC ); dRes = dVal * dVal; CrtPrintf( RTU("64-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes ); uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC ); dRes = dVal * dVal; CrtPrintf( RTU("Default : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes ); } ... You will need to comment all calls to CrtControl87 CRT function and FPU settings need to be set at a compilation time using /fp:[ mode ] option. Notes: CrtControl87 = _control87 CrtPrintf = _tprintf RTU = _T RTfloat = float RTdouble = double etc

SergeyKostrov · ‎07-31-2013

Here is a a collection of IDZ threads related to different issues with floating point data types, FPU, etc on Intel CPUs: Forum topic: Support of 'long double' floating point data type on Intel CPUs ( A collection of threads ) Web-link: software.intel.com/en-us/node/375459 Forum topic: Mathimf and Windows Web-link: software.intel.com/en-us/forums/topic/357759 Forum topic: Support of Extended or Quad IEEE FP formats Web-link: software.intel.com/en-us/forums/topic/358472 Forum topic: Using 'long double' in Parallel Studio? Web-link: software.intel.com/en-us/forums/topic/266290 Forum topic: Why function printf does not support long double? Web-link: software.intel.com/en-us/forums/topic/372720 Forum topic: Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy? Web-link: software.intel.com/en-us/forums/topic/361134 Forum topic: Test results for CRT-function 'sqrt' for different Floating Point Models Web-link: software.intel.com/en-us/forums/topic/368241

MHlav · ‎08-01-2013

Sergey, thanks for the info. I'll look into it.