- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I'm creating an ALU for a homemade NIOS II compatible processor with Quartus Prime, targeting the Cyclone V. I discovered that, depending on the operations ordering, the ALU has a different maximal frequency. For example, for an ALU supporting four operations (addition, substraction, logical OR, and logical AND), the fastest design is the one that implements them in the following order: Index Operation 00 ADD 01 SUB 10 OR 11 AND I've tested all combinations; something that can't reasonably be done for bigger designs. Is there a way to find the optimal operation ordering without having to try every possible combination? Thanks in advance!Link Copied
8 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean by 'operations ordering'?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ALU's operation is selected by a signal. In the above example, the ALU performs addition when that signal is equal to 0; substraction when equal to 1… By "operation ordering" I meant the mapping between ALU's control signal and the operations: which operation ALU's perform when that control signal equals XXX.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, but I fail to see how the exact encoding value for each operation matters, given the way FPGAs work.
Are you using a prioritized case statement to generate the logic, or is it a purely logical operation? Without seeing the code you write it is hard to make any substantial comment on the result you see. Examples (in verilog): reg [7:0] a; reg [7:0] b; reg [7:0] s; reg [1:0] f; // unordered: s = ({8{f==0}} & (a+b)) | ({8{f==1}} & (a-b)) | ({8{f==2}} & (a|b)) | ({8{f==3}} & (a&b)); case (f) 0: s = a+b; 1: s = a-b; 2: s = a|b; 3: s = a&b; endcase // prioritized/ordered: s = (f == 0) ? (a+b) : ((f == 1) ? (a-b) : (((f == 2) ? (a|b) : (a&b)))); if (f == 0) s = a+b; else if (f == 1) s = a-b; else if (f == 2 ) s = a|b; else s = a&b;- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unordered!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TO_BE_DONE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Unordered! --- Quote End --- Then this does not make sense to me. If indeed your implementation is written as 'unordered' (as my example shows above) then the exact encoding of which operation is selected by code 0,1,2,3 should make no difference, as the LAB logic can select any of the bit patterns equally easily. So you need to provide a lot more detail on what you are implementing (example code of yours) and what the fitting results are, as what you have reported so far is too general.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following SystemVerilog code has been compiled with Quartus Prime 16.0.0, targeting Cyclone V 5CGXFC5C6F27C7. Inputs are synchronous.
module Alu(input logic operand1, operand2, input logic control, input logic clock, output logic result);
always_ff @(posedge clock) begin
case (control)
'h0: result <= operand1 + operand2;
'h1: result <= operand1 - operand2;
'h2: result <= operand1 & operand2;
'h3: result <= operand1 | operand2;
endcase
end
endmodule
For this ALU, there is a difference of a few MHz for an average frequency of 800MHz between the fastest and the slowest design. However, for a bigger ALU implementing all NIOS II's arithmetic (except multiply and divide) and logical operations, the fastest design I found is clocked at 200MHz while the slowest runs at 150MHz… That's huge, and just by changing operations order!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you read Joysyb's document?
If you use timing requirements to specify the frequency at which your system should work (and if you don't, you should definitely start with that) then all combinations should be able to reach the same target frequency. If you just look at the fmax then even a tiny change in your code will give different results. With its default settings Quartus will just try to reach the target frequency and will stop optimizing once it reaches it.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page