The following code:
program spectest integer NITER parameter (NITER = 1000) real a(NITER), b(NITER), c(NITER) CHARACTER*16 out integer i a = 1 b = 1 c = 1 c(100) = 0 i = ieee_flags('set', 'exception', 'all', out) ! !DIR$ NOUNROLL do i = 1 , NITER if (c(i) .ne. 0) then a(i) = b(i) / c(i) endif enddo print *, a(101) end
when compiled with ifort 18.0.5 and -fp-speculation=strict it disables vectorization of the division loop:
LOOP BEGIN at spectest2.f90(17,10)
remark #15326: loop was not vectorized: implied FP exception model prevents vectorization. Consider changing compiler flags and/or directives in the source to enable fast FP model and to mask FP exceptions [ spectest2.f90(19,22) ]
At first this seems expected - I could assume that speculation is somehow linked with the vectorization of the loop.
What seems strange though is that the same code compiled with -fp-speculation=safe allows vectorization of the loop and an FPE is not raised.
Therefore, the compiler has managed under the "safe" setting to get away with a vectorized plus non-speculative code.
Whatever that code is why it cannot be generated also for the case of -fp-speculation=strict?
I am trying to understand the speculation=safe vs strict settings. The only information that I could find in the compiler manuals is the following:
strict : Tells the compiler to disable speculation on floating-point operations.
safe: Tells the compiler to disable speculation if there is a possibility that the speculation may cause a floating-point exception.
However this does not add up to the behavior I am observing for these options.
I would really appreciate If somebody could help clarify these options or point me to some more detailed documentation.
You quote the docs as saying that speculation=safe is sufficient to guarantee no FPE, but then say both that this works as expected and that it does not. Is it the same with -prec-div and -no-prec-div? If it's difficult to tell us which target ISA you prefer, it might be better that you examine the generated code yourself.
The speculation=safe works as expected, it is the speculation=strict that does not work as expected in my opinion.
The generated code for spedulation=safe looks like this (ifort 184.108.40.2064 , -xCORE-AVX2 -fp-speculation=safe)
..B1.12: # Preds ..B1.12 ..B1.11 # Execution count [1.00e+03] vmovups qtest_$C.0.1(,%rax,4), %ymm2 #19.17 vmovups qtest_$A.0.1(,%rax,4), %ymm6 #20.19 vcmpneqps %ymm1, %ymm2, %ymm8 #19.26 vblendvps %ymm8, %ymm2, %ymm10, %ymm5 #20.19 vblendvps %ymm8, qtest_$B.0.1(,%rax,4), %ymm10, %ymm4 #20.19 vrcpps %ymm5, %ymm3 #20.19 vfnmadd213ps %ymm0, %ymm3, %ymm5 #20.19 vfmadd213ps %ymm3, %ymm3, %ymm5 #20.19 vmulps %ymm5, %ymm4, %ymm7 #20.19 vblendvps %ymm8, %ymm7, %ymm6, %ymm9 #20.19 vmovups %ymm9, qtest_$A.0.1(,%rax,4) #20.19 addq $8, %rax #17.15 cmpq $1000, %rax #17.15 jb ..B1.12 # Prob 99% #17.15
So the compiler has as expected inserted some vblendvps before vrcpps to protect against the FPE. The loop is still vectorized. (Those vblendvps are not there for the speculation=fast therefore an FPE is raised in that case)
My question is:
Why the speculation=strict option cannot generate a code like this and chooses to disable the vectorization altogether?
The -prec-div does not seem to have any effect as far as the FPE safety is concerned, it just seem to replace the Newton-Raphson instructions with a vdivps
That goes to my point; the vdivps sequence would not introduce spurious FPE for subnormal values of c(:), so it might be considered less speculative. The situation would arise normally only with gradual underflow enabled, so the compiler may not consider it. If you wished your test to exclude subnormals, ABS(c(i)) >= TINY(c((i))) might do it; again, I don't expect the compiler to analyze deeply (nor can it do the entire job, since the -ftz options apply only to MAIN).
As all AVX-capable CPUs have the performance fix where addition of subnormals doesn't incur a time-consuming trap, it seems that the default of abrupt underflow is more of a legacy compatibility thing than a necessary default. The possibility of subnormals becoming a performance issue for divide in normal applications is considered remote.
I haven't seen a study on the merits of avoiding -prec-div for current CPUs. It doesn't have nearly the performance impact it had in the past.
With your initializations, you don't expect subnormals. If the compiler were to perform constant propagation so as to have those values influence the generated code, I might expect the entire divide loop to be optimized away rather than eliminating consideration of subnormals. If it were to do that, vectorization also could be suppressed.