Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)

Optimal ALU

Altera_Forum
Honored Contributor II
1,307 Views

Dear all, 

I'm creating an ALU for a homemade NIOS II compatible processor with Quartus Prime, targeting the Cyclone V. I discovered that, depending on the operations ordering, the ALU has a different maximal frequency. For example, for an ALU supporting four operations (addition, substraction, logical OR, and logical AND), the fastest design is the one that implements them in the following order: 

 

 

Index 

Operation 

 

 

00 

ADD 

 

 

01 

SUB 

 

 

10 

OR 

 

 

11 

AND 

 

 

I've tested all combinations; something that can't reasonably be done for bigger designs. Is there a way to find the optimal operation ordering without having to try every possible combination? 

Thanks in advance!
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
583 Views

What do you mean by 'operations ordering'?

0 Kudos
Altera_Forum
Honored Contributor II
583 Views

The ALU's operation is selected by a signal. In the above example, the ALU performs addition when that signal is equal to 0; substraction when equal to 1… By "operation ordering" I meant the mapping between ALU's control signal and the operations: which operation ALU's perform when that control signal equals XXX.

0 Kudos
Altera_Forum
Honored Contributor II
583 Views

Ok, but I fail to see how the exact encoding value for each operation matters, given the way FPGAs work. 

Are you using a prioritized case statement to generate the logic, or is it a purely logical operation? 

Without seeing the code you write it is hard to make any substantial comment on the result you see. 

 

Examples (in verilog): 

 

reg [7:0] a; 

reg [7:0] b; 

reg [7:0] s; 

reg [1:0] f; 

 

// unordered: 

 

s = ({8{f==0}} & (a+b)) | ({8{f==1}} & (a-b)) | ({8{f==2}} & (a|b)) | ({8{f==3}} & (a&b)); 

 

case (f) 

0: s = a+b; 

1: s = a-b; 

2: s = a|b; 

3: s = a&b; 

endcase 

 

// prioritized/ordered: 

 

s = (f == 0) ? (a+b) : ((f == 1) ? (a-b) : (((f == 2) ? (a|b) : (a&b)))); 

 

if (f == 0) s = a+b; 

else if (f == 1) s = a-b; 

else if (f == 2 ) s = a|b; 

else s = a&b;
0 Kudos
Altera_Forum
Honored Contributor II
583 Views

Unordered!

0 Kudos
Altera_Forum
Honored Contributor II
583 Views

TO_BE_DONE

0 Kudos
Altera_Forum
Honored Contributor II
583 Views

 

--- Quote Start ---  

Unordered! 

--- Quote End ---  

 

 

Then this does not make sense to me. If indeed your implementation is written as 'unordered' (as my example shows above) then the exact encoding of which operation is selected by code 0,1,2,3 should make no difference, as the LAB logic can select any of the bit patterns equally easily. 

 

So you need to provide a lot more detail on what you are implementing (example code of yours) and what the fitting results are, as what you have reported so far is too general.
0 Kudos
Altera_Forum
Honored Contributor II
583 Views

The following SystemVerilog code has been compiled with Quartus Prime 16.0.0, targeting Cyclone V 5CGXFC5C6F27C7. Inputs are synchronous. 

module Alu(input logic operand1, operand2, input logic control, input logic clock, output logic result); always_ff @(posedge clock) begin case (control) 'h0: result <= operand1 + operand2; 'h1: result <= operand1 - operand2; 'h2: result <= operand1 & operand2; 'h3: result <= operand1 | operand2; endcase end endmodule 

For this ALU, there is a difference of a few MHz for an average frequency of 800MHz between the fastest and the slowest design. However, for a bigger ALU implementing all NIOS II's arithmetic (except multiply and divide) and logical operations, the fastest design I found is clocked at 200MHz while the slowest runs at 150MHz… That's huge, and just by changing operations order!
0 Kudos
Altera_Forum
Honored Contributor II
583 Views

Did you read Joysyb's document? 

If you use timing requirements to specify the frequency at which your system should work (and if you don't, you should definitely start with that) then all combinations should be able to reach the same target frequency. 

If you just look at the fmax then even a tiny change in your code will give different results. With its default settings Quartus will just try to reach the target frequency and will stop optimizing once it reaches it.
0 Kudos
Reply