I am writing AVX code inside asm blocks (don"t want to use avx intrinsics).
A lot of gp registers are used and so they are mixed with the ones generated by the compiler and thus it is screwing the behavior of the code pretty fast.
Is there an automatic or manual way to avoid these register overlaps ?
Any link to documentation would be great.
I would like also to use asm blocks in fortan with ifort, but didn't find the way yet.
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
I want to compare the performance of compiler generated code vs handcoded assembly code from a FLOPS perspective. Just want to get an idea of the relative performance of both implementations for one function.
The function is now implemented in AVX but this register 'overlap' issue makes it unusable. Heard of clobbered registers but I'm not sure what it does, whether it might help me or not.
The clobbered register list allows you to tell the compiler which registers are affected by your inline assembly code, so it doesn't assume that their values persist across it. This applies to Gnu-style inlined assembly. For more information, see Agner Fog http://www.agner.org/optimize/optimizing_assembly.pdf, section 6.2. You should also look near the end of that document for information on how to measure code performance.