Significant differences in performance with ifx vs ifort

ereisch2 · ‎10-09-2025

I'm sure this has been covered already, but I'm seeing a very marked decrease in performance for some functions in ifx when compared to code generated by ifort. I was conducting a test to see what was faster: cmplx( cos(x), sin(x) ) or exp( cmplx(0.0, x) ), and the following data came out (numbers are % CPU times relative to the first test):

COS,SIN (ifx): 1.0
EXP (ifx): 3.919

COS,SIN (ifort): 0.275
EXP (ifort): 0.280

These results are curious because I was under the impression that it's cheaper to compute the COS and SIN of an angle together than to do them separately, and the EXP( CMPLX(0.0, X) ) makes this explicit that we are trying to fetch both of these values. So that it's slower to do this in both ifx and ifort was a bit surprising. But the bigger shock was that ifort was 3.6x faster (COS,SIN) and 14x faster (EXP) than the same code compiled with ifx, using the same compile arguments. We are preparing to transition our scientific numerical package from ifort to ifx, but these results are pretty profound.

ifort version 2021.11.1
ifx version 2024.0.2

Compile flags for both tools are: "-O3 -assume nounderscore -warn all"

Test routine:

   PROGRAM CIS_TEST

   COMPLEX*8 VAR1, OUT
   REAL*4 ARG
   INTEGER*4 I
C
   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = CMPLX( COS(ARG), SIN(ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination

   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = EXP( CMPLX(0.0, ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination

#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

static double u_last, s_last = 0.0;

void print_usage() {
    struct rusage usage;
    double secs;

    getrusage( RUSAGE_SELF, &usage );

    secs = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec * 0.000001;
    printf("User time: %.3f\n", secs - u_last);
    u_last = secs;

    secs = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec * 0.000001;
    printf("System time: %.3f\n", secs - s_last);
    s_last = secs;
}

ereisch2 · ‎10-09-2025

Digging into this a bit more, it appears as though IFX is not "recognizing" what underlying math operations should be called for certain lines of code (i.e., what does a call to EXP() really do?). Example: ifort correctly recognizes that CMPLX( COS(ARG), SIN(ARG) ) and EXP( 0.0, ARG) are mathematically equivalent, and if you examine the generated assembly, they both produce calls to __libm_sse2_sincosf. However, ifx is blindly calling cexpf, which is obviously slower. Despite trying different iterations of the "-march=", "-arch", etc. flags, I can't seem to get ifx to switch the cexpf call to the sincosf call, let alone a LIBM SSE2-optimized version of either one. So that's probably why it's so much slower: ifort was using an Intel-optimized SSE2 math library and calling sincosf, whereas the link map suggests ifx is using a SVML library and brute-force calling expf, cosf, and sinf separately (i.e., it isn't even calling sincosf for the combined test case on line 10 above).

wilford139 · ‎10-09-2025