Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Comparing compilers

magicfoot
Beginner
769 Views
In terms of the "not so obvious" performance benefit gained when using an Intel compiler on an Intel platform, are there any suitable numerical benchmarks that can be used for a comparison of different compilers? My comparison(using numerical algorithms)of Intel's Composer against other popular compilers shows that Intel Composer out performs the others by nearly 50% when using nothing more than /O2. No SSE, etc used.

I have not yet started profiling the code on both platforms as to why this should be, but am suspicious that the improvement when using Intel Composer is so large as this is too good to be true.I have some detail off the web that shows similar results.

http://www.open-mag.com/features/Vol_15/IntelC/intelc.htm

How does the Intel compiler build code that is so much quicker than others ?


0 Kudos
14 Replies
A_T_Intel
Employee
769 Views
The Intel Compiler team optimizes well known numerical algorithms such as LINPACK and FFTs. It may be that the algorithm you are testing has been highly optimized for performance on Intel Architecture.

At optimization level 2 (O2)you can encounter "dead-code elimiation".I would verify that the numerical algorithm of interest in your test was not elimiated by the compiler.
0 Kudos
TimP
Honored Contributor III
769 Views
The most frequent gain with Intel compilers at default options is due to auto-vectorization, but that is associated with default SSE2 code generation. Most other compilers which support auto-vectorization require an option set to turn it on. You quote a web site from the days before gnu C supported auto-vectorization.
0 Kudos
rom828b
Beginner
769 Views
The newer compilers also know about AVX if you are on a SNB system.
0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting magicfoot
In terms of the "not so obvious" performance benefit gained when using an Intel compiler on an Intel platform, are there any suitable numerical benchmarks that can be used for a comparison of different compilers?


Did you tryLINPACK?

0 Kudos
TimP
Honored Contributor III
769 Views
Intel relies on MKL to optimize linpack, so the compilers may be no better than others. Do you compile your BLAS libraries yourself? Or, do you mean the quality of the compiler's interface to MKL?
0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting magicfoot
...I have some detail off the web that shows similar results.

http://www.open-mag.com/features/Vol_15/IntelC/intelc.htm

How does the Intel compiler build code that is so much quicker than others?


http://www.open-mag.com/features/Vol_15/IntelC/intelc.htm

I read the article and, unfortunately, itis full of inconsistencies. It is not clear whenthe articlewas
written.Taking into account that testers have usedPentium III computers, Visual C++ v6.0 and
Intel C++ compiler v5.0 the articleis describing10 years old events!

>>...
>>That becomes even more interesting when Microsoft Visual Studio.NET is brought into the equation.
>>While still a beta product, tests of the new complier has so far proven it to be about 25% faster than the
>>current MS Visual C++.
>>...

I'm not surprised becauseVisual C++ v6.0 was introduced in 1998and it couldn'tcompile a declaration
like:

typedef union _declspec( align( 16 ) )__m128
{
...
} __m128;

Asupport for SSE instructions was introduced in a first version ofVisual Studio .NET in 2001.

0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting TimP (Intel)
Intel relies on MKL to optimize linpack, so the compilers may be no better than others. Do you compile your BLAS libraries yourself?

[SergeyK] No.

Or, do you mean the quality of the compiler's interface to MKL?

[SergeyK] No.


Best regards,
Sergey

0 Kudos
TimP
Honored Contributor III
769 Views


typedef union _declspec( align( 16 ) )__m128
{
...
} __m128;

Asupport for SSE instructions was introduced in a first version ofVisual Studio .NET in 2001.

As you seem to emphasize use of ancient versions of Visual Studio (requiring obsolete versions of Windows), I'll mention that there was an SSE header service pack for VS6 prior to introduction of VS .net. Even now, a decade later, MSVC "support" for parallel SSE instructions doesn't include auto-vectorization, as all other currently maintained compilers do.
The old article you quote didn't likely include optimization by intrinsics among the demonstrated MSVC performance improvements subsequent to VS6. As I recall, VS2005 was the first MSVC to offer scalar SSE2 code generation. Without support for vectorization, there wasn't sufficient incentive to support mixed SSE and x87 code for P-III and Athlon32. VS2003 didn't support X64, although it was still on support 5 years ago. Performance improvements included splitting all 64-bit data moves into 32-bit pairs to compensate for lack of alignment support, along with several other scalar code optimizations.
MSVC with /fp:fast continues reasonably competitive with respect to scalar code optimization. Note that /fp:fast in MSVC (more aggressive than default) is roughly equivalent to /fp:source in ICL (less aggressive than default).
0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting TimP (Intel)


typedef union _declspec( align( 16 ) )__m128
{
...
} __m128;

Asupport for SSE instructions was introduced in a first version ofVisual Studio .NET in 2001.

As you seem to emphasize use of ancient versions of Visual Studio (requiring obsolete versions of Windows),

[SergeyK] VS6could be installed and it workson ALL 32-bit versions of Windows, that is,
95, 98, Millennium, NT3.5\4.0, 2000, XP,etc, and I wouldn't use VS6for the
development unless a customer needs a support for it.I told in another thread that some
companies are still using VS6.

I'll mention that there was an SSE header service pack for VS6 prior to introduction of VS .net.

[SergeyK]That's good to know and I'll check it.
...

0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting TimP (Intel)
...
I'll mention that there was an SSE header service pack for VS6 prior to introduction of VS .net.
...


Tim,

Thank you formentioning it! As far as I remember it was called a Visual C++ 6.0Processor Pack.

Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
Quoting TimP (Intel)
...
I'll mention that there was an SSE header service pack for VS6 prior to introduction of VS .net.
...


Tim,

Thank you formentioning it! As far as I remember it was called a Visual C++ 6.0Processor Pack.

Best regards,
Sergey


Here is a link to that "magic" update:

http://msdn.microsoft.com/en-us/vstudio/aa718349.aspx

0 Kudos
gladiolus
Beginner
769 Views
The Intel Fortran compiler generates machine code to support these requirements in accordance with your selected /fpe option setting.. In other words, the generated code must support the trapping mode..
0 Kudos
SergeyKostrov
Valued Contributor II
769 Views
...
I'm not surprised becauseVisual C++ v6.0 was introduced in 1998and it couldn'tcompile a declaration
like:

typedef union _declspec( align( 16 ) )__m128
{
...
} __m128;

Asupport for SSE instructions was introduced in a first version ofVisual Studio .NET in 2001.


This is a follow up on the topic.

As soon as a Visual C++ 6.0Processor Pack is installed the compiler could compile a declaration
mentioned above and there isa support for some subset of SSE intrinsic functions.

0 Kudos
TimP
Honored Contributor III
769 Views
...
I'm not surprised becauseVisual C++ v6.0 was introduced in 1998and it couldn'tcompile a declaration
like:

typedef union _declspec( align( 16 ) )__m128
{
...
} __m128;

Asupport for SSE instructions was introduced in a first version ofVisual Studio .NET in 2001.


This is a follow up on the topic.

As soon as a Visual C++ 6.0Processor Pack is installed the compiler could compile a declaration
mentioned above and there isa support for some subset of SSE intrinsic functions.

It supports the SSE instructions which were implemented on P-III and Athlon-32. It would not support SSE2 instructions, so you would not run into the lack of full SSE2 support on original Turion.
Intel compilers no longer support SSE for those early SSE CPUs.
0 Kudos
Reply