Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

PDEP/PEXT in Fortran

Steven_V_
Beginner
1,679 Views

Hi,

is there any extension available to use the PDEP/PEXT bit operations in Fortran? Is there any plan to add bit manipulations like that to the Fortran standard?

greetings,

Steven

0 Kudos
1 Solution
Steven_L_Intel1
Employee
1,679 Views

The only reason to use a C routine is if you want to use the Haswell instruction intrinsics on a Haswell processor. If you aren't running on a processor with that instruction set, you'd just get a runtime error trying to use these intrinsics.

One could write a C routine that used "manual CPU dispatch", executing the Haswell instruction(s) if running on a capable processor or generic code if not. If you built this using Intel C++ and the /Qipo option, the optimizer might inline the call. It IS possible, through an undocumented feature, to use many (but not all) instruction intrinsics from Intel Fortran, but I have not studied these bit intrinsics to see if that's possible. You'd still have to detect the instruction set and execute generic code (which you would have to write.) Given that the randombit.net article provided generic C code, doing this bit (!) in C would make more sense to me.

View solution in original post

0 Kudos
6 Replies
Steven_L_Intel1
Employee
1,679 Views

The Fortran standard has lots of intrinsics for bit manipulation.I haven't heard of PDEP/PEXT - what are these?

0 Kudos
mecej4
Honored Contributor III
1,679 Views

Steve Lionel (Intel) wrote:

The Fortran standard has lots of intrinsics for bit manipulation.I haven't heard of PDEP/PEXT - what are these?

Google yielded this article about these bit-level scatter/gather instructions, made available on Haswell CPUs: http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations.html

Perhaps one should expect the Intel C compiler or the IPP library to support these instructions rather than the Fortran compiler? 

0 Kudos
Steven_V_
Beginner
1,679 Views

I found those pdep/pext instructions when looking for ways to scatter bits. Apparently there are available on certain Intel architectures (https://software.intel.com/en-us/node/514045). However, I did not find any equivalent intrinsic that does this bit operation for Fortran.

I'm using bit scattering operations inside an inner loop of a performance-critical section in my Fortran code, and I was wondering if using these instructions might speed up that section.

0 Kudos
Steven_L_Intel1
Employee
1,679 Views

Are you running the program on a Haswell processor? Fortran doesn't have bit gather/scatter intrinsics. You could call out to an Intel C++ routine and use its instruction intrinsics (again for supported processors only.) 

0 Kudos
Steven_V_
Beginner
1,679 Views

I'm not running the program on a Haswell processor myself, but large calculations are usually run on computer clusters, so I should take into account that it might be the case, and then the possible speed-up is definitely important.

So, if I understand correctly, I should create a Fortran function which calls a C function (using bind), then write a C function and compile that with the Intel C compiler and then link. I'm only wondering if in that case the function will be inlined?

0 Kudos
Steven_L_Intel1
Employee
1,680 Views

The only reason to use a C routine is if you want to use the Haswell instruction intrinsics on a Haswell processor. If you aren't running on a processor with that instruction set, you'd just get a runtime error trying to use these intrinsics.

One could write a C routine that used "manual CPU dispatch", executing the Haswell instruction(s) if running on a capable processor or generic code if not. If you built this using Intel C++ and the /Qipo option, the optimizer might inline the call. It IS possible, through an undocumented feature, to use many (but not all) instruction intrinsics from Intel Fortran, but I have not studied these bit intrinsics to see if that's possible. You'd still have to detect the instruction set and execute generic code (which you would have to write.) Given that the randombit.net article provided generic C code, doing this bit (!) in C would make more sense to me.

0 Kudos
Reply