- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
With IPP library, there are functions:
ippGetCpuFeatures
ippSetCpuFeatures
ippGetEnabledCpuFeatures
Does the C++ compiler provide similar run-time CPU code path dispatch functions? The objective is of course, to allow detection of AMD CPUs.
Currently the automatic code dispatch seems to be locked to SSE3 instruction set support only. (when running on AMD).
Thanks!
Atmapuri
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Intel compiler supports the following intrinsics to select CPU for manual dispatch,
extern int _may_i_use_cpu_feature(unsigned __int64); extern void _allow_cpu_features(unsigned __int64); |
For more information you can search for "feature" in the Intel intrinsics guide,
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=feature
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Sir,
Thank you for this pointer. So I tried this:
void allow_cpu_features_dspd(const unsigned __int64 a)
{
_allow_cpu_features(a);
}
However the compiler reports the following errors:
Error Intrinsic parameter must be an immediate value
Error Problem during multi-file optimization compilation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, can you be more specific about the environment when you got this error, ,e.g. the command to compile the code, compiler version, gcc/g++ versions or VS studio versions, windows or linux environment ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiler: icc.exe (Intel C++ 19.2)
Compiling from dll project on VS2022 on Windows 10
Command line:
/permissive /GS /Qoffload:none /TC /W3 /Gy /Zc:wchar_t /I"C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\compiler\include\icc" /I"C:\Program Files (x86)\Intel\oneAPI\ipp\latest\include\ipp" /O2 /Fd"x64\Release\vc143.pdb" /Qopenmp-offload- /Zc:inline /D "_WINDLL" /D "_MBCS" /Qipo /Zc:forScope /arch:SSE3 /Oi /MT /FC /Fa"x64\Release\" /nologo /Fo"x64\Release\" /Qstd=c99 /Fp"x64\Release\MtxVec.Dspd.6.3.pch" /QaxCORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX,SSE4.2,SSE4.1
The project compiles, if I comment out this line:
void allow_cpu_features_dspd(const unsigned long long a)
{
// _allow_cpu_features(a);
}
I was trying different things with local vars and varying types passed, but could not get any new behavior.
If I try icx.exe (Intel C++ 2024), I get this error:
Severity Code Description Project File Line Suppression State Details
Error call to undeclared function '_allow_cpu_features'; ISO C99 and later do not support implicit function declarations...
If I declare:
extern void _allow_cpu_features(unsigned __int64 a);
I get:
1>lld-link: : error : undefined symbol: _allow_cpu_features
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I can see the error (Intrinsic parameter must be an immediate value), I think the error makes sense to me. The feature tells the compiler to generate optimized code for specific selected feature at compile time, which means the compiler needs to know what the request features are at compile time. For your code, the compiler won't be able to know what the requested features are at compile time. If you supply the call with more specific arguments, e.g.
void dspd_avx(...)
{
_allow_cpu_features(_FEATURE_AVX);
...
}
it'll pass the compilation and the code will be compiled targeting avx ISA. Here's some more example _allow_cpu_features
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Sir,
The reason why this does not work, is because your answer does not address the question asked. We are using the
/QaxCORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX,SSE4.2,SSE4.1 /arch: SSSE3
compiler switches.
So the compiler already does the runtime dispatch. Or lets say, it should be doing runtime dispatch. However, this does not happen for AMD CPUs. AMD CPU's will depend on the /arch parameter. In the upper case, that will be SSSE3.
If we dont specify /arch, then a default option of /arch: SSE2 is used. If we do specify /arch: AVX2, then the code is fast, but does not run on older AMD CPUs.
What could be done, to enable the behaviour provided via /Qax, to work also for AMD?
Thanks!
Atmapuri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, to the best of my knowledge, I don't think auto cpu dispatch (-ax for linux or /Qax for windows) works for non Intel processors. Maybe the best option here is manual cpu dispatch. I think you can consult the other API, _may_i_use_cpu_feature , which doesn't perform vendor check.
if (_may_i_use_cpu_feature(_FEATURE_AVX)) {
Use AVX intrinsics; //works for modern CPU
} else if(_may_i_use_cpu_feature(_FEATURE_SSE2)){
Use SSE2 intrinsics; //works for older cpus
}else{
Generic code;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let suppose, we have a function like this:
void ippSimilarAdd_64f(double *Src1, double *Src2, double *Dst, int Len)
{
for (int = i; i < Len; i++) Dst[i] = Src1[i] + Src2[i]
}
What would be the most elegant approach for manual dispatch? In this case we are not using any CPU specific intrinsics for the actual computation in the body of the function, but would still like to see SSE4.2, AVX, AVX2 and AVX512 dispatch depending on the hardware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, if that's the case, probably _may_i_use_cpu_feature is not a good choice. I learned that Intel compiler offers another option, -mauto-arch , can you try this out ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, I have tried:
/Qauto-arch:CORE-AVX512 /arch:SSE3
On Intel AVX512 capable hardware I still get good performance, but on AVX2 capable AMD, the /arch:SSE3 continues to be used.
This is not desired or expected. I have tried to specify:
/Qauto-arch:CORE-AVX512, CORE-AVX2
but get a warning from the compiler:
warning #10121: overriding '/Qauto-archCORE-AVX512' with '/Qauto-archCORE-AVX2'
This again is not desired. We would like to have code that on AMD with AVX2 will use AVX2 and on AMD with AVX512 would use AVX512 and the base should remain SSE3.
How to do that?
Thanks!
Atmapuri

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page