Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
12748 Discussions

Multiplying by four a register

Altera_Forum
Honored Contributor II
1,566 Views

Hi all, 

 

I have just started using NIOS II, and I have a small question: 

 

which is the best way for multiplyiing a register by four? 

 

until now I found the following alternatives (suppose r16 contains the value to be multiplied) 

 

1) generated by gcc accessing an array of 32 bit integers... does the compiler uses this instruction also on the small and on the economic version of niosII? 

muli r16, r16, 4  

 

2) 

addi r16, r16, r16 

addi r16, r16, r16 

 

3) 

sll r16, r16, 2 

 

 

Best Regards, 

 

Paolo
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
865 Views

Ehm... of course it was: 

 

2) 

add r16,r16,r16 

add r16,r16,r16 

 

Paolo
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

On the Economic version you don't get multiplier hardware. 

 

On the other two versions you have the multiplier that also performs shift operations (If you roll left by 2 (ie multiply by 4) then I wouldn't doubt that you end up just multiplying anyway). 

 

You're solutions 1 and 3 probably take the exact same amount of time, whereas number 2 would be longer I would assume since I don't see them being able to add all three in parallel (maybe they can). 

 

Either way you're talking probably 1 clock cycle to 2 or 3 cycles anyway. Hopefully you don't need better performance then that.
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

Ok, I think I wll use option 3, that also on the economic version of the NIOS II is able to shift in 2 cycles... 

 

Thanks!!! 

 

Paolo
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

Be careful with shifting signed numbers. (don't want to modify the sign bit)

0 Kudos
Altera_Forum
Honored Contributor II
865 Views

The best option depends on how soon the result of the multiply is used by other instructions, 

which FPGA family you are using, and which Nios II you are using. 

In general, option 2 is the best since it has a throughput of 0.5 cycles and a latency of 2 cycles 

in all combinations. 

 

BTW, a throughput of 0.5 cycles means you get a multiply result every 1/0.5 = 2 cycles and 

a latency of 2 cycles means the result isn't ready for 2 cycles. 

 

Let me explain more. On Stratix I and Stratix II devices, the Nios II/s and Nios II/f 

use the hardware multipliers to perform multiplies. The throughput is one multiply per cycle but  

with a 3 cycle latency. If you try to use the result of multiply in one or two cycles, the dependent 

instruction is stalled which results in a throughput of 0.33 cycles and a latency of 3 cycles. 

For example, this code: 

muli r16, r16, 4 

xor r4, r5, r16 

will take 4 cycles to execute because the xor is stalled for 2 cycles since it uses the muli result. 

However, this code: 

muli r16, r16, 4 

muli r17, r17, 4 

muli r18, r18, 4 

xor r4, r5, r16 

will also take 4 cycles to execute because the non-dependent muli to r17 and r18 (or any other non-dependent 

instructions) don't stall and the xor that uses r16 is far enough away from the muli to r16 to not stall. 

So, this code achieves multiplies with a throughput of 1 cycle and the latency of 3 cycles is hidden by 

the non-dependent instructions. 

 

Option 3 (using a shift) has the same performance as the multiply on Nios II/f and Nios II/s on Stratix I and Stratix II 

because we actually use the hardware multiplier to perform shifts and rotates.
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

Hi, 

 

really thanks a lot to all for the good informations you gave me... 

 

To answer the question "be aware of signed integers"...currently I'm using these instruction to address in assembler some small vectors of integers, so I expect the indexes to be small positive numbers :-) 

 

Thanks again, 

 

Paolo
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

Another question related to this topic... 

 

Since stalls in the pipeline influence the performance of the code I'm writing, is it possible to know, given a NIOS II hardware, if and where a set of assembler instructions are stalling due to register precedence relations? 

 

Thanks again for all, 

 

Paolo
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

If you can run on modelsim, use the w command in modelsim to display waves. 

Then you can see your the exact timing of your instructions.
0 Kudos
Altera_Forum
Honored Contributor II
865 Views

Somewhere in the big NIOS II doc, they give you the timing for the assembly instructions, but like James said their can be exceptions for many cases. 

 

Sounds like you need/want every clock cycle you can get so modelsim should be a lot of help to you (never used it but it looked like it could give you a lot of info).
0 Kudos
Reply