Re: Is this already maximally pipelined ... anything I can do to speed this up?

Altera_Forum · ‎01-19-2018

According to .sta.rpt Fmax for this is 307 MHz which is less than 50% of the datasheet listed max speed...

wbr Kusti


always @(posedge sys_clk) begin
    if (($signed({1'd0, e}) | (counter >= $signed({1'd0, 1'd0})))) begin
        product <= (a * b);
        accum_product <= (accum_product + product);
        accum_in_a <= (accum_in_a + a);
        accum_in_b <= (accum_in_b + b);
        product_accum_in_a_and_b <= (accum_in_a * accum_in_b);
    end

Altera_Forum · ‎01-19-2018

First of all, I will commend that your code is not too clear. Why are you comparing the counter to a signed version of 0? why is the counter signed? so all you need to check is the MSB of counter.

Also, its not obvious that you are checking if e is greater than 0. Be explicit to be clearer.

How big is e? if it is more than 3 bits, you could do the compare to 0 (or you can OR all of the bits in the bus) in a previous cycle, then OR the 1 bit value with the counter compare result.

Other than that, it is about as efficient as you can get.

Remember, that the datasheet is idealised maxed speed. So this will be zero/minimal delay to the logic inputs. In reality, this will never happen. I dont know what chip you are using, but 300MHz would be fairly reasonable.

Bare in mind that if this is your top level design, then it will be routing to the logic from device pins, which will likely be quite a delay as the DSPs are not near the pins. Try setting the logic pins to virtual to remove this delay.

Altera_Forum · ‎01-19-2018

--- Quote Start ---

According to .sta.rpt Fmax for this is 307 MHz which is less than 50% of the datasheet listed max speed...

wbr Kusti


always @(posedge sys_clk) begin
    if (($signed({1'd0, e}) | (counter >= $signed({1'd0, 1'd0})))) begin
        product <= (a * b);
        accum_product <= (accum_product + product);
        accum_in_a <= (accum_in_a + a);
        accum_in_b <= (accum_in_b + b);
        product_accum_in_a_and_b <= (accum_in_a * accum_in_b);
    end

--- Quote End ---

There is one possible improvement: register 'a' and 'b':


 rega <= a
 regb <= b
 product <= rega * regb

Perhaps also do the same for the second multiplication.

Depending on the size of the vectors the additions may turn out to be slow, you may want to write some code to pipeline the additions too. The ultimate speed is obtained when there is only one LUT between each register.

Altera_Forum · ‎01-19-2018

Hi, thanks for answering.

The code is generated by Migen which (I guess) just in case adds those signed stuff there. Anyways, this is not real production code as such, just experimenting to get some guidelines for speed as to which way to go.

Ok, thanks for confirming that this cannot be much improved. The chip is Arria 10 (mid speed, don't could not rely related the speed class of the chip with the data sheets speed).

Yes, this was top level test, how should I set the pins virtual, this is not clear to me?

thanks again, Kusti

Altera_Forum · ‎01-19-2018

In the assignments editor, one of the options is "virtual_pins". Assign this to *. This means it will not connect the interface of your design to any pins. It is only for testing resource usage and fmax estimates. Dont use it in final builds.

Altera_Forum · ‎01-19-2018

Ok, thanks, as I'm only using command line I need figure out how to do that from the text file but I expect I can use the assignment editor to get a clue. Thanks!