I'm rather new to practical FPGA design though I have a lot of theoretical background.I'm working with (starting/trying to would be more accurate) the Arria 10 development kit which has Arria 10 10AS066N3F40E2SG in case it matters but this is a more general question. I've been up and down the information highway and forums and documentation for half a day but I can't find the answer. How do I determine what kind of speeds this chip is capable? I realise that the actual design will affect what can be achieved but the design will be influenced by speeds that can be achieved, kind of chicken and egg eh? Here is my current problem. I need to perform 10 bit x10 bit => 20 bit (signed) multiplication and accumulate the result at 730 MHz. How can I find out if this is at all realisable? If not I need divide the data stream to two or more parallel processing paths. I'm looking for concrete and practical guidance. I expect I will need to write some synthesisable Verilog and time that on the simulator right? From here: https://people.ece.cornell.edu/land/courses/ece5760/de1_soc/hdl_style_qts_qii51007.pdf I found (just an example): module signed_mult (out, clk, a, b); output [15:0] out; input clk; input signed [7:0] a; input signed [7:0] b; reg signed [7:0] a_reg; reg signed [7:0] b_reg; reg signed [15:0] out; wire signed [15:0] mult_out; assign mult_out = a_reg * b_reg; always @ (posedge clk) begin a_reg <= a; b_reg <= b; out <= mult_out; end endmodule This is 1 stage pipe line I think, but how could this be further pipelined? AFAIU it can't -- hopefully it will be realised with a hardware multiplier block in Arria 10 and that set the ultimate limit (routing delays etc not withstanding), right? But what is this limit, it must be documented somewhere? Is there a ready to compile synthesisable Verilog example somewhere which I could start to play with ? Or something that would be a good starting point, surely I'm not the first person to need something like that... I'm so out of my depth atm here.... Help would be greatly appreciated. wbr Kusti
730Mhz is rather ambitious, even for Arria 10. I think you'll have to go down the multiple parrallel paths route.If you want to test the max frequence, create a design and run it through to time quest. Set all pins to virtual and it should give you an idea of the fmax for your design as it defaults to trying to get 1000Mhz out of your design (which will fail) but should at least give you an idea. Realistically, 300Mhz per multiplier is probably more realistic.
Ok, thanks!I managed to find out the theoretical max multiplier speed (635MHz for the fastest speed class Arria 10), it was in the data sheet, doh! So definitely 730 MHz is out, however, rearranging a few things I think I could live with 575 MHz so at least theoretically this is not impossible. I guess there is nothing to do but write some Verilog and see. wbr Kusti
--- Quote Start --- Ok, thanks! I managed to find out the theoretical max multiplier speed (635MHz for the fastest speed class Arria 10), it was in the data sheet, doh! So definitely 730 MHz is out, however, rearranging a few things I think I could live with 575 MHz so at least theoretically this is not impossible. I guess there is nothing to do but write some Verilog and see. wbr Kusti --- Quote End --- 635MHz looks too unrealistic. May be it is defined from mult block perspective with no logic across fabric and pins. Remember those blocks such as mults are in fact small islands of ASIC to be connected to fabric. While asic blocks can be very fast but fabric cannot cope with.
The datasheet figures are always the theoretical maximum of the silicon internal to the DSP. What will kill the timing is getting data in and out of the DSP.Always take the datasheet max with a pinch of salt, and so I only expect 75% in a user design with extreme pipelining and timing control. 50% is an easier to achieve figure.
Yeah, I realize this, thanks.575/635 = 90% of theoretical max speed ... not promising. However the processing block that is going to be implemented will be the same weather there are N or 2*N of them so I will start from implementing and timing that. Thanks!
While I have your attention:The exact device I have on my evaluation board is 10AS066N3F40E2SG which I have hard time decoding and relating to the datasheet. 10A = Arria 10 S = ??? 066 = 660 klog elements N = 48 transceivers 3 = transceiver speed 1..5 F = fineline BGA 40 = 40x40mm2 package E = extended temperature range 2 = FPGA fabric speed 1..3 S = standard power G = RoHS6 But when the datasheet talks about speed in connection with multipliers/memory blocks and MHz it uses the title 'performance' in the tables and lists frequencys for variants like –E1S, –I1S, –E2L, –E2S, – E2V, –I2L, – I2S, –I2V, –E3S, –E3V –I3S, –I3V, – A3S So which one I have? cheers Kusti so