- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With NIOS-1 i could select three implementations for multiplications (sw, mul_step, mul). In Nios-2 this feature is complety missing. Is there any way to tell the SOPC-Builder for NIOS-2 to integrate a hw-based multiplier with only a few cycles latency(1-3)?
I can do it with a custom instruction but how can I tell the compiler to use this function for multiplications - or better how can i instruct the sopc-builder to take my multiply custom instruction for the mul instruction? I know that Cyclone2 (and Stratix) will have such predifined blocks - but right at the moment I have to do it with a Cyclone and I'm willing to spend some LE's for this feature. Another question: what happend to the predefined "divide" custom instruction available in NIOS1? There is a subdirectory for it in the "components" directory but in the SOPC-Builder it's not visible under custom instructions. Thanks for any help ChrisLink Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nios II, release 1.0, doesn't support hardware multiply on Cyclone.
We have received many requests for this so expect to see it in a future release. As for the divide custom instruction, Nios II/f has optional support for a built-in divide instruction that the compiler will use if the option is enabled. This replaces the divide custom instruction for Nios II. Also, we found a couple of corner-case bugs with the divide custom instruction so decided to turn it off. The corner cases are when you divide 0x0/0x80000000 or 0x80000000/0x80000000 you get the wrong result.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got the tip to select a device with hw-multiblier-blocks (i.e. Cyclone-II) in the SoPC-Builder, generate the processor and prior to compile in Quartus change an option in the ???_mult_cell.vhd file - has someone done this successfully to get a working hw-multiplier for a Cyclone device?
Fast multiplication (and division) is very important for us because we use the Cyclone with Nios in a hard realtime system for high dynamic industrial motion control. James: <div class='quotetop'>QUOTE </div> --- Quote Start --- As for the divide custom instruction, Nios II/f has optional support for a built-in divide instruction that the compiler will use if the option is enabled. This replaces the divide custom instruction for Nios II.[/b] --- Quote End --- Where can I find this option? Thank you for your help Chris- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know if Cyclone II would have it in the GUI but it shows up as a check box under the "f" core. Right now it's only that core that has it available. If you need one really bad you can always re-invent the wheel and make one (binary divider) since I'm 99% sure that's what altera does and add it as a custom instruction.
Chances are in the next release it will be available for all cores since it's not DSP dependent anyway (so it should work on Cyclone or Stratix and all three cores).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<div class='quotetop'>QUOTE </div>
--- Quote Start --- I don't know if Cyclone II would have it in the GUI but it shows up as a check box under the "f" core.[/b] --- Quote End --- This is true if I select a Stratix device in "Target Device Family". But for a Cyclone/CycloneII Devices the checkbox is hidden. At least for my SOPC-Builder 4.1 Build 207. We decided to use a custom instruction for multiplication and division until this options is available in a future release. Is there any way to tell the compiler to use such an instruction instead the standart mul?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think you can, I think you are stuck using the standard inst_name(A,http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/cool.gif ; format.
Try editing the ptf file of the nios to see if you can get it working there (James a while back showed me how to disable the hardware multiplier in that file so maybe you'll be able to invoke it???? assumming that hardware implementation for that in cyclone is already present). <----- Really doubt it will work though.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you generate for a Stratix device and then really compile it on Cyclone device,
you will get a hardware multiplier. However, note that since Cyclone has no DSP multiplier cells like Stratix does, Quartus will construct the multiplier using LEs. This has the effect of increasing the size of the Nios II core by 2X-3X and cutting the frequency by 2X-3X. This is why it wasn't included as an option in Nios II release 1.0 since we did not think it would be an attractive option for most customers.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Elan. Would multiplies and divides in about 35 clock cycles be good enought times for you? If so you could contruct these in hardware and use them as custom instructions (multiplication is a bunch of additions, and division is a bunch of comparisons and subtractions).
For both I'm guessing you are looking at around 300-350LEs to implement those for the NIOS. Hope that gives you some hope http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As for the divide checkbox not being available under cyclone, I'll look into this.
I believe it is just a packaging choice that you only can get divide if you get multiply. Since there is no multiply on cyclone, there is no divide. However, I designed divide to be independent of multiply so it should work fine. What's required is some PTF changes. How about you email your Nios II class.ptf and your system.ptf and I'll see about making the changes? My email is jball@altera.com. As for multiply, you could use a custom instruction but there is no way to get the compiler to use it when you use the * operator. You'll have to change all your source code that wants to use the fast multiply to call the multiply instruction instruction. There are macros defined for you by SOPC Builder that will call a custom instruction from C-code or assembly.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My 2c - if you want a multiply you can get one which runs at 85+Mhz, but takes 3-4 cycles, and a divide can run at speed, but takes multiple cycles. This allows NIOS to run all other stuff fast, but take more cycles to do mult and div - see my single precision FP stuff if you want - 3 cycles for add and 4 cycles for mult. Oh yeah and if you want divide - I have one too (still refining for double precision atm) Regards, Steve.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would be happy with a hw-multiply taking 3-4 cycles. For division 35 cycles are OK.
I don't see the point in Jame's post why a multi-cycle hw-multiplyer would cut frequency? We will also take a look on Steve's FP-stuff. Thanks Chris- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What he is trying to say is if you put this big massive multiplier out there things will slow down (multipliers implmented in LEs are pretty big).
Currently I've been finding that even thought the Stratix has multipliers implmented in DSP blocks, it is the main factor in your fmax with NIOS II. Without the DSP blocks doing a hardware multiplier you will have to make a comprimise somewhere. That's why I was suggesting one that was bitwise (around 35 clock cycles). It would be as slow as the division but it would be small and pretty fast.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Look for a hardware multiplier option for Cyclone and Cyclone II devices as part of the next full release of Nios II. This will release near the end of the year, and will be both shipped as an update and available for download.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<div class='quotetop'>QUOTE </div>
--- Quote Start --- Look for a hardware multiplier option for Cyclone and Cyclone II devices as part of the next full release of Nios II. This will release near the end of the year, and will be both shipped as an update and available for download.[/b] --- Quote End --- Sounds good, althought I need the multiplier immediately. We made our own CI-multiplier working perfect. But if there is an access to data-structures the processors still calls the "__mulsi3" routine in the lib2-mul.c file - I assume this are pointer operations. The other file is the "alt_exception.S" where some multiply and division routines can be found (in assembler). My idea is to add my custom instruction to these files to get a fast multiplications - can someone point me to the information when the routines in this files are called and whats the difference between them? Thanks a lot Chris- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris,
__mulsi3 is the standard GCC single-width integer multiply routine. It is called when the compiler knows that the Nios II processor lacks integer multiply instructions. It's a great place to put your multiply custom instruction. BTW, if you do any 64-bit math, you might see calls to __muldi3. alt_exceptions.S contains the multiply and divide emulation routines, among other things. You can ignore these as long as the software of your system has been generated for Cyclone. The emulation routines are called only when a Nios II processor that lacks a hardware multiply attempts to execute a multiply instruction. Kerry- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you need something now, then you will have to sacifice something.
Binary Multiplier ----> small hardware size, "long" latency (about 35 cycles), need to map hardware (can't use "*" directly). Parallel Multipier ----> large hardware size, short latency (probably 1-3 depending on how it's implemented), need to map hardware (can't use "*" directly), and as James mentioned can impact Fmax greatly. Software Multiplier (like Kerry suggested) -----> no extra hardware, long latency, no need to map hardware for multiplication (you use "*" in you're code) If any of those sound good to you let us know and we can provide more info on how to go about using any of these options.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kerry,
<div class='quotetop'>QUOTE </div> --- Quote Start --- __mulsi3 is the standard GCC single-width integer multiply routine. It is called when the compiler knows that the Nios II processor lacks integer multiply instructions. It's a great place to put your multiply custom instruction.[/b] --- Quote End --- We got now the problem that we don't know how to tell the compiler to recompile the "lib2-mul.c" where we call our custom multiply instruction. It has something to do with the libgcc - are there some compiler flags we can set? Perhaps a trivial question - but we have more knowledge on the HW-stuff than on compilers...- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finallly we have our HW-Based multiplier.
The trick to tell the compiler to use our custom multiplier was to put somewhere in the project a "__mulsi3()" routine. This one seems to override the builtin library function. This method only works if we are using alt_main - for us OK because we don't use the HAL. Perhaps someone with deeper knowledge of the compiler can clarify this. The HW is a peripheral (not a custom instruction) implenting an asynchronous multiplier. To get the result it takes 3 clock cycles@60MHz on a Cyclone with speed grade 8 - the multiplyer-unit is defined as multi-cycle in Quartus - hence no impact on Fmax. Nice side-effect of this implementation is that we can use a 64bit result with the same hardware and time. We know that normally everything should be designed strictly syncronous - but sometimes it's necessary to take other solutions. BTW: with the same method we implemented a 32/32 divider with less than 12cycles. And we get result and remainder in the same time. Chris- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ya there's no problem with implementing the multiplier asynchronously. I'm not too familar with the DSP blocks in cyclone, but in stratix you get input/output registers within the block so they don't require extra LE's in case you want to make the block synchronized.
I'm assuming the divider you used is the megawizard one as well with the divisor and dividend in, and quotient and remainder out. I was using that in a design in the mean time before I made my own synchronized divider (wanted something around 1/5th the size in LEs). Watch out for the answers coming out of that block. The have two modes....... I'd tell you what they are but still haven't installed quartus because I've been busy around the house. But basically one mode gets you a proper answer (remainder always positive), and the other mode gives you sign dependent remainders (but can give you magnitudes that you don't normally expect. If you were unaward of this I would throw it into a simulation by itself and try different values into it to make sure you don't run into bugs later down the road (because that divider wouldn't be the first place to look I'd imagine). Or check the documentation for lpm_divide or lpm_divider (whatever it's called) in quartus and they show near the bottom what kind of results to expect from the different modes (keep and eye on the remainder or you might miss the difference between the two modes) Cheers.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page