- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a few weeks, we will have another generation of Intel HPC system. We will have systems that support SSE4.2 (Nehalem, Westmere), AVX (SandyBridge, IvyBridge), and CORE-AVX2 (Haswell) optimizations. Since the compile nodes are being upgraded to Haswell as well, I want to tell the users to specify something different than -xHost when using Intel Fortran so binaries can be backwards compatible and run on any of the clusters. I planned to tell the users to use -xSSE4.2 -axCORE-AVX2,AVX.
My questions are:
1) Is this the best way to compile codes so they can support different vectorization units depending on which system they run?
2) How do I confirm that the correct code path is being used based on the system used? I have looked at the assembly and see that the instructions are there, but I want to know at runtime the code is being used.
Thanks,
Craig
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Yes, that's good advice
2) I don't think there's any supported method to determine that. You can certainly query the processor to see which instruction sets it supports, though the method of doing that is rather arcane. You could run the program under VTune on selected systems and look at instruction counts for specific regions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
Querying the processor isn't enough. I need to know that the CORE-AVX2 specific instructions are being used, not the SSE4.2 instructions, when run on Haswell.
I will fire up VTUNE and see what it tells me.
Thanks, Craig
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just keep in mind that the compiler decides if it is worthwhile generating a path with the AVX2 instructions. Depending on your code, it may choose not to do so.
Another thing you can do to test is run the program under gdb, set a breakpoint at the routine and step through instructions to see what it does. This might be a problem with high optimization and inlining, though.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page