Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
29309 Discussions

Significant differences in performance with ifx vs ifort

ereisch2
Beginner
443 Views

I'm sure this has been covered already, but I'm seeing a very marked decrease in performance for some functions in ifx when compared to code generated by ifort.  I was conducting a test to see what was faster: cmplx( cos(x), sin(x) ) or exp( cmplx(0.0, x) ), and the following data came out (numbers are % CPU times relative to the first test):

 

COS,SIN (ifx):     1.0
EXP (ifx):             3.919

COS,SIN (ifort):  0.275
EXP (ifort):          0.280

 

These results are curious because I was under the impression that it's cheaper to compute the COS and SIN of an angle together than to do them separately, and the EXP( CMPLX(0.0, X) ) makes this explicit that we are trying to fetch both of these values.  So that it's slower to do this in both ifx and ifort was a bit surprising.  But the bigger shock was that ifort was 3.6x faster (COS,SIN) and 14x faster (EXP) than the same code compiled with ifx, using the same compile arguments.  We are preparing to transition our scientific numerical package from ifort to ifx, but these results are pretty profound.

 

ifort version 2021.11.1
ifx version 2024.0.2

Compile flags for both tools are: "-O3 -assume nounderscore -warn all"

Test routine:

   PROGRAM CIS_TEST

   COMPLEX*8 VAR1, OUT
   REAL*4 ARG
   INTEGER*4 I
C
   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = CMPLX( COS(ARG), SIN(ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination

   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = EXP( CMPLX(0.0, ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

static double u_last, s_last = 0.0;

void print_usage() {
    struct rusage usage;
    double secs;

    getrusage( RUSAGE_SELF, &usage );

    secs = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec * 0.000001;
    printf("User time: %.3f\n", secs - u_last);
    u_last = secs;

    secs = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec * 0.000001;
    printf("System time: %.3f\n", secs - s_last);
    s_last = secs;
}

 

 

0 Kudos
4 Replies
ereisch2
Beginner
431 Views

Digging into this a bit more, it appears as though IFX is not "recognizing" what underlying math operations should be called for certain lines of code (i.e., what does a call to EXP() really do?).  Example: ifort correctly recognizes that CMPLX( COS(ARG), SIN(ARG) ) and EXP( 0.0, ARG) are mathematically equivalent, and if you examine the generated assembly, they both produce calls to __libm_sse2_sincosf.  However, ifx is blindly calling cexpf, which is obviously slower.  Despite trying different iterations of the "-march=", "-arch", etc. flags, I can't seem to get ifx to switch the cexpf call to the sincosf call, let alone a LIBM SSE2-optimized version of either one.  So that's probably why it's so much slower: ifort was using an Intel-optimized SSE2 math library and calling sincosf, whereas the link map suggests ifx is using a SVML library and brute-force calling expf, cosf, and sinf separately (i.e., it isn't even calling sincosf for the combined test case on line 10 above).

0 Kudos
wilford139
Novice
417 Views

Significant differences in performance with ifx vs ifort

 
 
 
wilford139_0-1760040951675.png

 

ereisch2
New User
‎10-09-2025 10:48 AM
 25 Views
 
 
 

I'm sure this has been covered already, but I'm seeing a very marked decrease in performance for some functions in ifx when compared to code generated by ifort.  I was conducting a test to see what was faster: cmplx( cos(x), sin(x) ) or exp( cmplx(0.0, x) ), and the following data came out (numbers are % CPU times relative to the first test):

 

COS,SIN (ifx):     1.0
EXP (ifx):             3.919

COS,SIN (ifort):  0.275
EXP (ifort):          0.280

 

These results are curious because I was under the impression that it's cheaper to compute the COS and SIN of an angle together than to do them separately, and the EXP( CMPLX(0.0, X) ) makes this explicit that we are trying to fetch both of these values.  So that it's slower to do this in both ifx and ifort was a bit surprising.  But the bigger shock was that ifort was 3.6x faster (COS,SIN) and 14x faster (EXP) than the same code compiled with ifx, using the same compile arguments.  We are preparing to transition our scientific numerical package from ifort to ifx, but these results are pretty profound.

 

ifort version 2021.11.1
ifx version 2024.0.2

Compile flags for both tools are: "-O3 -assume nounderscore -warn all"

Test routine:

   PROGRAM CIS_TEST

   COMPLEX*8 VAR1, OUT
   REAL*4 ARG
   INTEGER*4 I
C
   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = CMPLX( COS(ARG), SIN(ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination

   OUT = 0.0
   DO I = 1, 100000000
      ARG = 2.0 * 3.14159 * 150000000.0 * SNGL(I) / 1000000.0
      VAR1 = EXP( CMPLX(0.0, ARG) )
      OUT = OUT + VAR1
   END DO
   CALL PRINT_USAGE
   WRITE (*,*)    ! Prevent code elimination
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

static double u_last, s_last = 0.0;

void print_usage() {
    struct rusage usage;
    double secs;

    getrusage( RUSAGE_SELF, &usage );

    secs = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec * 0.000001;
    printf("User time: %.3f\n", secs - u_last);
    u_last = secs;

    secs = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec * 0.000001;
    printf("System time: %.3f\n", secs - s_last);
    s_last = secs;
}

 

0 Kudos
JFH
New Contributor I
324 Views

The largest ARG in that program is about 9.42477E+10, far beyond where COS(ARG), SIN(ARG) and EXP(CMPLX(0,ARG)) can be meaningfully calculated in single precision. Did ifort and ifx choose different workarounds?

0 Kudos
ereisch2
Beginner
133 Views

All of those functions are periodic, so they'll just be computed on MOD(2*PI).  The difference in speed between EXP and SIN/COS is that ifx is not decomposing the former into the latter; computing the exponential on a complex number is an expensive operation when compared to computing the SIN and COS of a single non-complex number.

0 Kudos
Reply