- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Now I am worry about performance of Sign Extend operation. for example, char sc(unsgined char uc) { return (signed char) uc; } Nios II GCC will generates following code. slli ra,rb,24 srai rb,ra,24 On Stratix, it is reasonable. On Cyclone, it takes over 50 clocks!? That's awful... Nios I GCC has "-muser-opecode-extv(extzv)=" to use a custom instruction for sign extend operation. But Nios II GCC does not support such kind of option. Any ideas? I think "-muser-opecode-xxx=" option was very useful. I hope the revivals of this feature for Nios II.Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Stratix they are using the hardware multiplier to do shifting functionality. Cyclone uses "soft" DSP blocks so it uses memory to perform something that Stratix can do in very few cycles (probably in 1).
So in short I think you're SOL for doing this fast without making you're own hardware to do it. It's possible to prevent the NIOS from using the hardware altogether by turning off the parameter in the ptf file for you're generated core. I don't know if that'll help you thought (never compiled NIOS II for Cyclone). Sorry for the bad news but..... well you know my name and all http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/sad.gif- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oppps thought you were doing shifting not sign extension.
To do sign extension you could look at the sign bit and OR the upper bits with a high mask if it was set (might be a bit faster). If that doesn't work let me know and I'll figure out what would slow those instructions down.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi BadOmen.
Thanks for reply. I know how to accelerate the performance manually. But this code was generated by Nios II GCC. Manual optimization requires source modifying. I don't wanna touch source code.... "soft" DSP means instance multiplier with LEs and memory? Could you tell me which option to do that?# or can I find backlog? But sounds like it is BIG resource requirement and SLOW fMax... I need shift performance, don't need multiplier. Is there any way to embedded user custom instruction as shift (or sign extension) operation? Nios I can do so. Thanks in advance.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ya just do a custom instruction. Depending on how fast you want the shift to be, there are various ways of doing it. If you don't care about speed you could use barrel shifter and increase the clock speed so that it shifts faster. (but really it's up to you how you're going to do it).
I call them soft DSP blocks because they use memory to do the operations instead of dedicated hardware (DSP blocks in stratix use dedicated hardware and they do not take up LEs because they are already present in the FPGA).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I know custom instruction (CI) is useful for this. But Nios II compiler does not support embeding CI as corresponding operation. To use CI, source modification is required. When Nios I, no need to modify. That's my frustration. Also, shift performance of Nios II on Cyclone is worst, and operational clock is not fixed. In embedded system, sometimes it is critical issue. Is there any idea of performance up shift (or sign extension) operation without source modification? Like Nios I could. Regards,- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without modifying anything you will not be able to boost the performance.
The quickest way to do this is to OR the upper half with the sign bit (that shouldn't take too many more cycles). When you just use the shifting functionality you are using the DSP blocks to do this (hardware multiplier). So without modifying anything you will be stuck to using these (just like you can't make a P4 go any faster without changing your code). The Cyclone is a low cost version of the Stratix, so you get what you pay for. If it could perform like a stratix then there would be no need for a Stratix and the Cyclone would end up costing the same as a Stratix http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif So long story short if you want performance you're going to have to do some work to get it from that FPGA
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page