Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12723 Discussions

__builtin_ctz - it could be hdw optimized but it isnt

Altera_Forum
Honored Contributor II
1,615 Views

Hi, 

 

Looking inside of gcc I see that it has a builtin function, __builtin_ctz, which finds the number of leading zeros (on the least significant side) of a word. I notice also that the soft floating point implementation that comes with gcc makes some use of __builtin_ctz and was hoping that the Nios2 soft floating point could run faster if this was optimized. Also my implementation of the internal interrupt unit dispatch level prioritisation uses __builtin_ctz. 

 

Ok, so the obvious idea is to just implement a custom instruction for CLZ, and there appear to be some good examples in the Altera documentation. So far, it looks promising. 

 

However, how does one make the gcc soft float libraries use a CLZ custom instruction? I see that a file in the gcc source code, "gcc-4.1/gcc/longlong.h", has custom implementations of count_leading_zeros for each of the various architectures; actually it is many of them with a conspicuous exception being the Nios2 architecture. So I am considering adding a clause to this file for the nios2 which redefines the count_leading_zeros macro to call an external function which might be implemented for example in the HAL in different ways depending on what custom instructions are present, but that approach wouldn't use an inline function so there could be some additional overhead. Maybe someone can suggest a better idea? Obviously, we have command line switches in gcc for some limited number of Nios2 configurations but we cant add them for all of the possible Nios2 permutations because there are too many of them. 

 

In any case, I haven't done anything with this, and was hoping someone out there has some free code I could steal, or that Altera has future plans to optimise this type of thing. Other similar areas of concern are the checksum calculations in the IP kernel, and network byte swapping however those are easier issues to auto-configure at build time since gcc isn't involved.  

 

Jeff
0 Kudos
0 Replies
Reply