Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Linking objects for different processor target

emmanuel_attia
Beginner
864 Views

Hi

I wanted to know if there is (or could be an idea) a way to tell the Intel Linker to link file in order of CPU capabilities.

For instance if i have test_avx.cpp compiled with /QxAVX and test_sse41.cpp compuled with /QxSSE4.1 and I implement the dispatching manually, I need that generated code from classes shared across the object (for which i don't need any vectorial optimization, like std::string) to be generated with the most generic code (which would be /QxSSE4.1 in this example).

Currently I achieve that by ordering manually the files in my Visual Studio project file (which by chance, influence the order of the obj in the linking command line) and that work.

Best regards

0 Kudos
14 Replies
Feilong_H_Intel
Employee
864 Views

Hi there,

I don't quite understand your question.  Why is the order of obj files in linking command line important to you?  Uploading a simple project could make your question clear to everyone.

Thanks.

0 Kudos
emmanuel_attia
Beginner
864 Views

For instance if i have this code:

test_avx2.cpp: compile with /QxAVX2

#include <immintrin.h>
#include <string>
#include <iostream>

void test_avx2()
{
    std::cout << std::string("test_avx2: ") << _mm_castsi128_si32(_mm256_castsi128_si256(_mm256_slli_epi16(_mm256_set1_epi16(rand()), 5))) << std::endl;
}

test_sse4.cpp: compile with /QxSSE4.1

#include <immintrin.h>
#include <string>
#include <iostream>

void test_sse4()
{
    std::cout << std::string("test_sse4: ") << _mm_castsi128_si32(_mm256_castsi128_si256(_mm256_slli_epi16(_mm256_set1_epi16(rand()), 5))) << std::endl;
}

test.cpp

#include <intrin.h>

int main()
{
    int CPUInfo[4];
    __cpuid(CPUInfo, 1);

    if ((CPUInfo[2] & ((1<<28)|(1<<27))) == ((1<<28)|(1<<27))) // Not exactly testing AVX2, but just for the sake of testing
        test_avx2();
    else
        test_sse4();
}

If I link in this order test.obj test_avx2.obj test_sse4.obj

On nehalem: Illegal instruction (because generated code for std::string is taken from the first module which is compiled with AVX2, even if no one gives a damn about AVXized memory initialization of the std::string)

On haswell: "test AVX2 ..."

 

If I link in the order of increasing processor extension: test.obj test_sse4.obj test_avx2.obj

On nehalem: "test SSE4 ..."

On haswell: "test AVX2..."

0 Kudos
Marián__VooDooMan__M
New Contributor II
864 Views

Greetings,

IMO this approach is totally wrong. There is no guarantee the order of *.obj is same as order of execution on start-up of the application. Future MSVC might choose different ordering, thus this ordering is ugly hack and and ordering of generated code in final binary is undefined.

You should use Windows's SEH exception handling to catch undefined OP code exceptions.

0 Kudos
emmanuel_attia
Beginner
864 Views

Hi Marian

1) Using SEH would solve the exception, but it would not allow my binary to execute on Nehalem. So there is no point in that solution.

2) Yes it seems like a ugly hack, but it's to counter the ugly Intel C++ feature that does not allow to emit vectorial instruction only where I want. (unless i dont use /Qx flags, but in that case, all the SSE instruction are not using the VEX encoding scheme in my AVX code path, which will penalize a lot the performances).

And speaking of the standard, having the same class being in 2 translation unit but with different definition is non standard and breaks the One Definition Rule.

For my classes this poses no problem (they are private, into a anonymous namespace), but as for STL classes, it's not possible to "anonymize" them. And that's very annoying because Intel C++ put SSE/AVX code in the STL classes, even if I don't care of their performances (they are never in hot sections of the code).

All this is a special case for Intel C++ compiler (when I compile with Microsoft, my software does not support dispatching yet and works on Haswell/AVX2 only). So I do have a "Standard C++" less functionnal fallback.

Best Regards

0 Kudos
emmanuel_attia
Beginner
864 Views

A good feature that Intel Compiler team could include to make this all standard is an option to disable vectorial code at all in non-private code.

Meaning vectorized code only:
* In static functions
* In anonymous namespaces

0 Kudos
Marián__VooDooMan__M
New Contributor II
864 Views

emmanuel.attia wrote:

Hi Marian

1) Using SEH would solve the exception, but it would not allow my binary to execute on Nehalem. So there is no point in that solution.

2) Yes it seems like a ugly hack, but it's to counter the ugly Intel C++ feature that does not allow to emit vectorial instruction only where I want. (unless i dont use /Qx flags, but in that case, all the SSE instruction are not using the VEX encoding scheme in my AVX code path, which will penalize a lot the performances).

And speaking of the standard, having the same class being in 2 translation unit but with different definition is non standard and breaks the One Definition Rule.

For my classes this poses no problem (they are private, into a anonymous namespace), but as for STL classes, it's not possible to "anonymize" them. And that's very annoying because Intel C++ put SSE/AVX code in the STL classes, even if I don't care of their performances (they are never in hot sections of the code).

All this is a special case for Intel C++ compiler (when I compile with Microsoft, my software does not support dispatching yet and works on Haswell/AVX2 only). So I do have a "Standard C++" less functionnal fallback.

Best Regards

re 1) it will definitely solve it on Nehalem. Because when exception is thrown, you can set your global variable "do not use this!", as you are using to dispatch the code for CPU which supports it.

if(!(cpu_features & SUPPORTS_THIS_AND_THAT))
    call_function_wich_does_not_support_this_and_that();

you were speaking of your own dispatching, didn't you? or am I missing something?

This way all libraries works, it gathers CPUID's profile of CPU, and then it do dispatching. Native ICC dispatching works this way as well, one example for all: MKL library, or open source FFTW library.

2) ugly hack means, it might not work in future versions, and what is most important, it might be not portable across MANY c++ compilers on market, whether free or open-source or for commercial... Please, pardon me, but I'm living in GNU world, so portability is my first priority. You may disagree, when you do a commercial product. Though, again, this logic might change in future versions of compilers, and you are in BIG trouble, when last compiler version on which your code was working well reaches end-of-life, say like in 5 years in the future. This is my definition of ugly hack I have used.

0 Kudos
Marián__VooDooMan__M
New Contributor II
864 Views

I recommend you to study source of e.g. FFTW library (and many more, or disassembly MKL form Intel) how they solved their own dispatching. They gather CPUID profile of CPU it is running on, and then use above code "if(supports(...)) { a(); } else { b(); }". This own dispatching is used in almost every software, in Windows kernel, since it can be run on various CPU types, libraries, software's, etc, etc, etc... You spoked of your own alternate code paths, didn't you? so why don't really use it?!

0 Kudos
emmanuel_attia
Beginner
864 Views

My problem is not about the cpu feature i explicitely used the problem is about the cpu feature the compiler uses automatically and the fact that several code in different translation unit define the same class.

Try to reproduce my problem maybe you'll see what I am talking about.

0 Kudos
emmanuel_attia
Beginner
864 Views

ugly hack means, it might not work in future versions, and what is most important, it might be not portable across MANY c++ compilers on market, whether free or open-source or for commercial...

I don't know any implementation of C++ linker that would not take the first definition of a class, when there is many. It's not standardized but there is no reason to do otherwise.

0 Kudos
Feilong_H_Intel
Employee
864 Views

As Marián "VooDooMan" Meravý said, you need cpu dispatch.  Please refer to https://software.intel.com/en-us/node/512787 for manual cpu dispatch.

Thanks.

0 Kudos
emmanuel_attia
Beginner
864 Views

Feilong H (Intel) wrote:

As Marián "VooDooMan" Meravý said, you need cpu dispatch.  Please refer to https://software.intel.com/en-us/node/512787 for manual cpu dispatch.

Thanks.

If I wanted to use that functionnality, I would have, but it's way too much Intel specific.

0 Kudos
Vladimir_Sedach
New Contributor I
864 Views
Emmanuel, I'd suggest: 1. Make separate exe file for each SSE/AVX and choose one of them at install time. No need in cpuid check at run time. 2. Force Intel to enable us to create and load to CPU our own vector instruction sets. Otherwise this problem with ugly solutions will remain forever -- Intel adds new instructions to any new CPU generation.
0 Kudos
emmanuel_attia
Beginner
864 Views

Vladimir Sedach wrote:

Emmanuel,

I'd suggest:
1. Make separate exe file for each SSE/AVX and choose one of them at install time.
No need in cpuid check at run time.

2. Force Intel to enable us to create and load to CPU our own vector instruction sets.
Otherwise this problem with ugly solutions will remain forever -- Intel adds new instructions to any new CPU generation.

Well there would be a very clean solution.

Intel could provide a compiler flag to allow the use of AVX (meaning SSE intrinsic will be translated to VEX/SSE instructions instead of legacy ones) without taking the initiative of putting AVX code ("automatic / often useless" vectorization). Unfortunately, that simple feature is not possible and Intel is not interested into it.

0 Kudos
Vladimir_Sedach
New Contributor I
864 Views

Emmanuel,

Don't worry much about Intel compiler! It generated much slower code than GCC (i'm using MinGW) in all my experiments.
Besides that, in GCC you can set instr set for separate functions with eg __attribute__ ((target ("avx2"))): https://gcc.gnu.org/wiki/FunctionMultiVersioning
GCC allows, say, to use -mavx2 -mno-avx2 options. That means AVX2 is allowed if and only if you explicitly use its intrinsics.
It's also less expensive on Windows.

0 Kudos
Reply