Re: ifx compiler options - Page 2

prop_design · ‎12-09-2023

hi,

so i see you are going to drop ifort. i have tested ifx some and have had some issues. do you know if the following ifx compiler options are still not available:

ifx: command line warning #10152: option '/QaxSSE2' not supported
ifx: command line warning #10148: option '/Qprec-sqrt' not supported

ifx: command line warning #10148: option '/Qpar-affinity:core,scatter' not supported
ifx: command line warning #10148: option '/Qparallel' not supported
ifx: command line warning #10148: option '/Qopt-report-phase=par' not supported

the last version of ifx i tested was 2023.2.1

thanks,

anthony

Steve_Lionel · ‎12-12-2023

Thanks @Barbara_P_Intel - that's good, at least. I am puzzled that I could not spot a mention of auto-parallel in the various ifx porting guides.

prop_design · ‎03-25-2024

I updated my compiler comparison today. As mentioned previously, there is a peculiar outcome I have noticed. Unoptimized Intel Fortran compiler code is extremely fast. It's faster than fully optimized gfortran and flang code. ifort and ifx unoptimized code is about the same. However, optimized ifx code is about 26% slower than optimized ifort code. Since you are phasing out ifort, it would be nice if you could figure out what's going on with ifx.

prop_design · ‎03-25-2024

I updated the download again.

I added more test cases for the ifx compiler, since there is a regression showing up. I ran all the same tests as the ifort compiler. This gives a little more insight into the issue. It appears that the /O1 and /O3 level optimizations are not performing as well, when comparing ifx to ifort.

Another interesting oddity is, unoptimized ifort and ifx code outperforms fully optimized gfortran and flang code.

Also, unoptimized ifort and ifx code performs about the same. So the regression is purely with the optimization levels. In fact, ifx /O2 level optimizations outperform ifort /O2.

The compilers behave about the same with regards to the other tests I ran.

This is as much as I can debug the issue. Hopefully, it will help you.

mecej4 · ‎03-26-2024

@prop_design, you wrote: "Another interesting oddity is, unoptimized ifort and ifx code outperforms fully optimized gfortran and flang code."

If you do not specify any optimization options (at the command line, Visual Studio project settings, or the files ifort.cfg/ifx.cfg, the Intel Fortran compilers use /O2 as the default optimization level. If you built using Visual Studio and selected a "Release Build", optimization is not turned off. If you really wish to disable optimization (for timing purposes, debugging, etc.), you have to specify /Od or adjust the project settings to disable optimization.

For details, see the Fortran developer reference .

prop_design · ‎03-26-2024

Hi @mecej4

Thanks for that info. I did specify /Od. I compile from the command line using batch files.

Barbara_P_Intel · ‎03-26-2024

Remember that ifx is a new compiler. We don't recommend just plopping in your ifort compiler options. There is some tuning involved.

With the information you provided, I can't give you any guidance to improve performance or file a performance bug.

Can you provide a small reproducer?

prop_design · ‎03-26-2024

@Barbara_P_Intel

hi barbara,

i guess my post wasn't clear. i have been doing fortran compiler testing for a very long time. the current benchmark program is at:

https://propdesign.jimdofree.com/fortran-benchmarks/

the download has everything you would need to reproduce similar results. the spreadsheet and pdf copy have the results for my computer. late last night, i updated the spreadsheet so that it's easy to see what i found comparing ifort to ifx. i was very thorough in testing the compiler options. there is a list of oddities i found for ifort and ifx. if it's not clear, let me know and i'll try to make it clearer. all of the batch files are in the download as well as some note files that contain them all in one file, for easier reading.

anthony

Ron_Green · ‎03-28-2024

/Qparallel and -parallel compiler option was added to ifx in the 2024.0 release. It was not in 2023.2.x or older.

BUT ...

It is a work in progress and is NOT the same as in ifort!

Work in progress: as you probably know, ifx is our Fortran Front End using the LLVM compiler framework. llvm does not support auto-parallelism in the way our older Intel compilers performed auto-parallel.

So what does [ /Q | - ]parallel do in ifx?

It converts DO CONCURRENT to OMP PARALLEL DO loops. That is all. There is no auto-parallelization for normal DO loops. That was a capability we created for our older proprietary compilers. We do not have the same capabilities in llvm.

A side effect of using /Qparallel is that the ifx driver adds linking in the Intel OpenMP runtime library.

I will take a look at the benchmark. And your options. Know that 2024.1.0 is coming out today ( 3/28/2024), assuming no last minute critical flaws are found in oneAPI. I would advise to move IFX to that version. There are approximately 400 edits (fixes and features) over 2023.2.1. Some of these are performance enhancements. I will try the benchmark code myself with the new ifx 2024.1.0 and see if any options can help.

prop_design · ‎03-28-2024

@Ron_Green

thanks for the info ron. i have 2024.0.0 and it said /Qparallel was not supported. perhaps one of the point releases added it? it is of no concern to me though, as it has never worked in a useful way with my codes. sometimes it slowed it down, sometimes it did nothing, etc... often times it would just not work (both for gfortran and ifort). for the version i have, it says ifx not supported, and for some reason ifort won't do anything. it exits saying it didn't do anything even with the flag it suggests added. the funny thing is it produced the fastest runtimes in some of my testing. so that got me testing turning different things off and i was eventually able to match the speed of the errant result. the spreadsheet lists all that i found. however, i don't know enough to debug it further. hopefully, what i did was clear enough to be of some help.

yes, i would appreciate your observations with the updated ifx. i am hesitant to update since it looks like i have the last ifort that was supported. i guess it now gives depreciation messages in newer variants.

ifx runtimes are about 4 seconds slower on my computer. so switching to it isn't ideal. it would also reduce the amount of cpus/apus covered by four years. i release my codes in the most generic way possible, so that the most people can run it. i also try to get it to run as fast as possible. but that is secondary. ifort has always produced fast code though. so i have been lucky to achieve both things.

the benchmark is using a real code. it is calculating performance maps for a turbofan engine at takeoff and cruise conditions. so it's not just some random code created purely for benchmarking. i did recently add some averaging loops. i was hesitant to do that, though. because, in the past, any mods i did for benchmarking would mess up auto-parallelization even more. currently, it has messed up the debugging output for gfortran. however, gfortran also never worked right with auto-parallelization. so it was worth it to me to automate the averaging. gfortran says it did some trivial auto-parallelization but the code still runs with one thread. perhaps on linux it might work right. however, on windows, it never has. i think the only code that sort of did something was pgi but i can't remember now. i think it would do a few trivial loops and actually run using more than one thread.

you will notice run-to-run variability. I can't get rid of it, even using maps and adding averaging. one reason I chose maps was that it has to do a lot of runs at different operating points. so as far as runtime is concerned, it sort of has an averaging effect. so you can't get a super precise time out of it. if the differences between compilers is big enough though, you can see it. if certain flags don't change things much, then it becomes impossible to access based on speed.

prop_design · ‎03-28-2024

@Ron_Green

hi ron,

since ifx auto-parallelization works for you, i added a batch file for it. i updated the spreadsheet so that there is a place holder for any future results.

Ron_Green · ‎03-28-2024

I had hoped to find sources that I could inspect, to help with suggestions.

But I do have a couple of suggestions. these follow our Porting Guide https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-ifort-to-ifx.html

ifx is greatly assisted with the -flto option, same as the -ipo option. ifort did better out of the box with inlining, ifx needs a little nudge with -flto

For both compilers, use -align array64btye. that helps with vectorization

I fully expected the vanilla O1 and O3 options by themselves to be slower with ifx. ifort did a LOT of proprietary optimizations at O2, O3 that ifx will not do by default. ifx O2 or O3 uses default LLVM optimization passes: nothing added by Intel. Only when you use -flto or /Qx or /Qax options do Intel optimization passes get added to the base LLVM optimization passes.

also for ifx, try -fnostandard-realloc-lhs with caution: if your array assignments are conforming, no issue. if not, you may get segfaults. This tells the compiler to assume array assignments that the LHS and RHS expressions are conformant arrays. This avoids array temporaries.

prop_design · ‎03-28-2024

@Ron_Green

hi ron,

thanks for the suggestions. i will start testing them out. i'm not sure what you meant by "I had hoped to find sources that I could inspect, to help with suggestions." if you mean the benchmark program, the source code is in there. look for the *.f file.

i'm thinking of maybe doing some statistics and adding a plot. however, not sure about that yet. the plots that are in there now are for what MAPS creates and uses. so they aren't necessary for just benchmarking. i was trying to benchmark using MAPS, as is, for awhile. since auto-parallelization didn't like more loops around it. however, the run-to-run variability made me add one additional loop. in reality, any version of PROP_DESIGN could be used for benchmarking. STATORS is currently the most compute intensive. so, if I didn't use MAPS, I would probably use STATORS. the other codes run so fast that they wouldn't be great for benchmarking.

i have the fortran benchmarks as one download and the actual PROP_DESIGN program as another. you don't need one to run the other. however, if you want to really dig into things, there is a lot more info in the PROP_DESIGN download. it's not at all necessary though. the benchmark download is a stand alone thing.

all of the codes are old school Fortran 77. part of the reason I made PROP_DESIGN was for educational purposes. all previous propeller design codes were written in Fortran 77. so the codes themselves are meant to provide some historical value in the way they were written and function. i have no doubt that, if a modern programmer completely re-wrote the codes, there could be some benefits. this isn't something i will be doing. i'm basically done with the project and just make sure it keeps running, as new compilers and other software change.

in any event, i think the benchmark is useful for testing compilers and compiler options. since, they all claim to support Fortran 77. i should add; there may be a few functions, in the codes, that are from Fortran 90. what happened was, I referenced various Fortran user manuals when writing the code and they didn't state which version any given function was introduced. many, many, years after I wrote the code, I noticed that some functions weren't actually a part of Fortran 77. it's not a big deal, but it kind of subverted what I was intending to do.

prop_design · ‎03-29-2024

@Ron_Green

hi ron,

i tested the options you suggested and found that some of them were useful. they sped ifx up quite a bit. they didn't have much of an effect on ifort. however, the speed of both compilers is closer together now.

ifort /O2 still looks to have a regression of some sort. it's real obvious in the data. i added a spreadsheet with statistical analysis. this just helps to see the run-to-run variability. however, the variability still is not completely captured. as you can re-run any given test and get different speeds. i updated the output of the benchmarking program, to make it much easier to perform the statistical analysis in the spreadsheet. i also re-ran all the tests, which takes quite awhile.

the updated compiler options are in the note files and in the batch files. many of the things you suggested in various posts don't work with the compiler i have installed. but that's not a huge deal. i was able to figure out what worked and what didn't. i have 2024.0.0 installed. i don't think i'm going to update it for quite some time. i'm happy with the performance of ifort. still the benchmarking program and results should still be of some use to people.

as far as the speed difference. it was 4 seconds and now it's around 1 second. so a big improvement. i think it was something like 25-30% different and now it's around 10%. interestingly, that difference mostly goes away depending upon what architecture is targeted. so that is now a new anomaly.