Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20688 Discussions

Arria 10 multiplier speed

Altera_Forum
Honored Contributor II
1,716 Views

I'm rather new to practical FPGA design though I have a lot of theoretical background.  

 

I'm working with (starting/trying to would be more accurate) the Arria 10 development kit 

which has Arria 10 10AS066N3F40E2SG in case it matters but this is a more general question. 

 

I've been up and down the information highway and forums and documentation for 

half a day but I can't find the answer. 

 

How do I determine what kind of speeds this chip is capable? 

 

I realise that the actual design will affect what can be achieved but the design will 

be influenced by speeds that can be achieved, kind of chicken and egg eh? 

 

Here is my current problem. 

 

I need to perform 10 bit x10 bit => 20 bit (signed) multiplication and accumulate 

the result at 730 MHz. 

 

How can I find out if this is at all realisable?  

 

If not I need divide the data stream to two or more parallel processing paths. 

 

I'm looking for concrete and practical guidance.  

 

I expect I will need to write some synthesisable Verilog and time that on the 

simulator right? 

 

From here: 

 

https://people.ece.cornell.edu/land/courses/ece5760/de1_soc/hdl_style_qts_qii51007.pdf 

 

I found (just an example): 

 

 

module signed_mult (out, clk, a, b); output [15:0] out; input clk; input signed [7:0] a; 

input signed [7:0] b; 

reg signed [7:0] a_reg; reg signed [7:0] b_reg; reg signed [15:0] out; wire signed [15:0] mult_out; 

assign mult_out = a_reg * b_reg; 

always @ (posedge clk) begin 

a_reg <= a; b_reg <= b; out <= mult_out; 

end endmodule 

 

 

 

This is 1 stage pipe line I think, but how could this be further pipelined? 

 

AFAIU it can't -- hopefully it will be realised with a hardware multiplier block 

in Arria 10 and that set the ultimate limit (routing delays etc not withstanding),  

right?  

 

But what is this limit, it must be documented somewhere? 

 

Is there a ready to compile synthesisable Verilog example somewhere which I 

could start to play with ? Or something that would be a good starting point, 

surely I'm not the first person to need something like that... 

 

I'm so out of my depth atm here.... 

 

Help would be greatly appreciated. 

 

wbr Kusti
0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
475 Views

730Mhz is rather ambitious, even for Arria 10. I think you'll have to go down the multiple parrallel paths route. 

 

If you want to test the max frequence, create a design and run it through to time quest. Set all pins to virtual and it should give you an idea of the fmax for your design as it defaults to trying to get 1000Mhz out of your design (which will fail) but should at least give you an idea. 

 

Realistically, 300Mhz per multiplier is probably more realistic.
0 Kudos
Altera_Forum
Honored Contributor II
475 Views

Ok, thanks! 

 

I managed to find out the theoretical max multiplier speed (635MHz for the fastest speed class Arria 10), it was in the data sheet, doh! 

 

So definitely 730 MHz is out, however, rearranging a few things I think I could live with 575 MHz so at least theoretically this is not impossible. 

 

I guess there is nothing to do but write some Verilog and see. 

 

wbr Kusti
0 Kudos
Altera_Forum
Honored Contributor II
475 Views

 

--- Quote Start ---  

Ok, thanks! 

 

I managed to find out the theoretical max multiplier speed (635MHz for the fastest speed class Arria 10), it was in the data sheet, doh! 

 

So definitely 730 MHz is out, however, rearranging a few things I think I could live with 575 MHz so at least theoretically this is not impossible. 

 

I guess there is nothing to do but write some Verilog and see. 

 

wbr Kusti 

--- Quote End ---  

 

 

635MHz looks too unrealistic. May be it is defined from mult block perspective with no logic across fabric and pins. 

Remember those blocks such as mults are in fact small islands of ASIC to be connected to fabric. While asic blocks can be very fast but fabric cannot cope with.
0 Kudos
Altera_Forum
Honored Contributor II
475 Views

The datasheet figures are always the theoretical maximum of the silicon internal to the DSP. What will kill the timing is getting data in and out of the DSP. 

Always take the datasheet max with a pinch of salt, and so I only expect 75% in a user design with extreme pipelining and timing control. 50% is an easier to achieve figure.
0 Kudos
Altera_Forum
Honored Contributor II
475 Views

Yeah, I realize this, thanks.  

 

575/635 = 90% of theoretical max speed ... not promising. However the processing block that is going to be implemented will be the same weather there are N or 2*N of them so I will start from implementing and timing that. 

 

Thanks!
0 Kudos
Altera_Forum
Honored Contributor II
475 Views

While I have your attention: 

 

The exact device I have on my evaluation board is 

10AS066N3F40E2SG  

which I have hard time decoding and relating to the datasheet. 

 

10A = Arria 10 

S = ???  

066 = 660 klog elements 

N = 48 transceivers 

3 = transceiver speed 1..5 

F = fineline BGA 

40 = 40x40mm2 package 

E = extended temperature range 

2 = FPGA fabric speed 1..3 

S = standard power 

G = RoHS6 

 

But when the datasheet talks about speed in connection with multipliers/memory blocks and MHz  

it uses the title 'performance' in the tables and lists frequencys for variants like  

–E1S, –I1S, –E2L, –E2S, – E2V, –I2L, – I2S, –I2V, –E3S, –E3V –I3S, –I3V, – A3S 

 

So which one I have? 

 

cheers Kusti 

 

 

so
0 Kudos
Reply