- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I have noticed that an important application seems to be running slower when compiled under Version 9 ( W_FC_C_9.0.020) than it did when compiled by the last released Version 8 compiler. Here are the benchmark timings in seconds:
Compiler Version Benchmark 2 Benchmark 3
IVF 8 957 2078
IVF 9 984 2142
Both benchmarks run about 3% slower under IVF 9.
For both compiler versions, the options used were: /nologo /O3 /Qxn.
All runs were performed on the same machine, an HP XW6000 workstation with dual P4 Xeon processors and 4 Gbytes of RAM, with no other applications running simultaneously. The output of the Intel Processor Frequency ID utility on this platform is:
Intel Processor Frequency ID Utility
Version: 7.0.20040526
Time Stamp: 2005/08/06 15:03:00
Number of processors in system: 2
Current processor: #1
Processor Name: Intel Xeon CPU 2.80GHz
Type: 0
Family: F
Model: 2
Stepping: 9
Revision: 22
L1 Trace Cache: 12 Kops
L1 Data Cache: 8 KB
L2 Cache: 512 KB
L3 Cache: None
Packaging: OOI
MMX: Yes
SIMD: Yes
SIMD2: Yes
SIMD3: No
NetBurst Microarchitecture: Yes
Hyper-Threading Technology: No
Expected Processor Frequency: 2.80 GHz
Reported Processor Frequency: 2.80 GHz
Expected System Bus Frequency: 533 MHz
Reported System Bus Frequency: 533 MHz
*************************************************************
Version: 7.0.20040526
Time Stamp: 2005/08/06 15:03:00
Number of processors in system: 2
Current processor: #1
Processor Name: Intel Xeon CPU 2.80GHz
Type: 0
Family: F
Model: 2
Stepping: 9
Revision: 22
L1 Trace Cache: 12 Kops
L1 Data Cache: 8 KB
L2 Cache: 512 KB
L3 Cache: None
Packaging: OOI
MMX: Yes
SIMD: Yes
SIMD2: Yes
SIMD3: No
NetBurst Microarchitecture: Yes
Hyper-Threading Technology: No
Expected Processor Frequency: 2.80 GHz
Reported Processor Frequency: 2.80 GHz
Expected System Bus Frequency: 533 MHz
Reported System Bus Frequency: 533 MHz
*************************************************************
I am at a loss as to why identical compiler options results in a consistently slower executable with the new compiler. Has anyone else noticed anything similar?
Thanks,
Peter
- Marcas:
- Intel® Fortran Compiler
Link copiado
12 Respostas
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Our own benchmarks show the opposite, but that doesn't mean that your particular application couldn't slow down. Sometimes a new optimization helps a majority of programs but hurts a few.
If you'd like the matter investigated, please send a report to Intel Premier Support and attach the sources so that we can look into it.
Just checking - you are using /QxN and not /Qxn - right? I think it might matter. Does /Qipo help or hurt?
If you'd like the matter investigated, please send a report to Intel Premier Support and attach the sources so that we can look into it.
Just checking - you are using /QxN and not /Qxn - right? I think it might matter. Does /Qipo help or hurt?
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks for the quick reply, Steve.
Yes, you are correct that the option I am using is /QxN, not /Qxn (a typo in my first posting). I am currently running with /Qipo and will report later today whether it helps or hurts.
My previous experience is not very good with Premier Support, which is why I posted here first.
--Peter
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
9.0 may perform -Qip by default, where it was not a default with 8.1. If you have only one source file, I would think there would be no difference between -Qip and -Qipo. In order to diagnose your performance problem, it may be necessary to profile, to find out where the extra time is spent.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I've run the benchmarks with the options suggested by Steve and Tim. The updated results are:
Compiler Options Benchmark 2 Benchmark 3
IVF 8/O3 /QxN 957 2078
IVF 9/O3 /QxN 984 2142
IVF 9 /O3 /QxN /Qipo 1015 2229
IVF 9 /O3 /QxN /Qip 9872159
So it looks like for this application /Qipo significantly hurts the performance and /Qip has a very slight negative effect on performance.
More about this application: This code isa reflector antenna shape optimizationcode purchased by my company from a commercial vendor. It consists of 24 source files containing dozens of modules and hundreds of routines. When I reported on using /Qipo above, it was actually specified for all but 2 of the source files. The program will not execute successfully if I specify /Qipo on those two files.
The timing difference between IVF 8 and IVF 9 is only about 3%, but it is repeatable and consistent. I had hoped for a performance improvement, not a slight hit, when upgrading the compiler. When I originally converted to IVF, I did extensive benchmarking of the code under IVF 8 to determine the best compiler options to result in the fastest run times. If you have made changes to the optimizations I'm afraid that I will have to now repeat that effort.
--Peter
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Please submit an issue to Intel Premier Support with a description of the problem and everything we need to build and test the application. We'll have our performance experts look at it.
From the results, I'm guessing that some inlining decisions are bad for your application.
From the results, I'm guessing that some inlining decisions are bad for your application.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I've submitted this as issue 319479. Thanks for the help.
--Peter
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Peter,
During my use of IVF 8 I made and observation and reported an incident concerning SSE3 instructions. The problem related to an alignment issue as SSE3 instructions require alignment at 16 byte intervals. The problem usualy showed up as a runtime error. In my opinion this was a LINKer problem.
When version 9 came out I examined the code generated to see if the problem was fixed. What I found was the erronious code was "fixed" and it appears that the fix was to remove the SSE3 instructions. If your application benefited from the use of SSE3 optimizations then V8 would run faster than V9. You can verify this by including /S to compile to .ASM file and then inspect the differences.
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Jim,
First, I doubt the fix you refer to was really to naively remove SSE3. More likely, an alignment was resolved, after which the compiler decided that SSE3 would not be beneficial. Can you give some more details on the issue you reported?
Second, since Peter uses QxN, the use of SSE3 cannot be an issue at all.
Aart Bik
http://www.aartbik.com/
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
The problem with the SSE3 instruction related to movapd e.g.
movapd xmmword ptr [_MOD_ALL_mp_ALL+20h (5E7A38h)],xmm0
Where movapd is used as a mini block move instruction (versis computational SSE3 instructions).
If you notice from the above snippet from a debug session the target location is 16 byte aligned to a module but the module was not linked to a 16 byte aligned address (you can figure this out by looking at the hex address 5E7A38h). Either the compiler should have flagged the module to have a 16 byte (or integral multiple thereof) alignment or the linker disregarded the alignment instructions. My work around in V8 was to inspect the linker map file and then add padd variables when needed.
V9 seems to have "fixed" this problem by eliminating the movapd instructions and most likely by way of indicating "don't know" alignment of variable address. It is possible that this "fix" (owka hack) is responsible for V9 running slightly slower than V8.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Are you not aware of the ALIGN keyword on the ATTRIBUTES and PSECT directives?
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I did try the ALIGN attribute on the variables but the alignment seemed to have been made to offset within the segment in which the variable resided. The segment itself did not inherit the (worst case) alignment restrictions within the segment.
I have not tried alignment by use of the !DEC$ PSECT...
Note, when looking in the IVF documentation by way the index under ALIGN (or alignment) there is no reference to PSECT. Search for "ALIGN PSECT" does find it but then you have to know the magic keyword.
When looking in the PSECT for what it does it says:
common-name
Is the name of the common block.
Is the name of the common block.
The data in my application with the alignment problem is not in a common block. It is in a module. I think I tried using PSECT but since there was no common block of that name (the module mangled name) the compiler balked. The problem is the the module's data, although having alignment within it's data segment,seems to have no directive for alignment of the segment within which it resides.
It would seem to me that if the programmer issued
module foo
REAL(8) :: var(12345)
cDEC$ ATTRIBUTES ALIGN: 16 :: var
end module foo
That the segment in which var resides is aligned in a compatible manner to which the offset of var is aligned.
Jim Dempsey
Responder
Opções do tópico
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora