I'm having difficulty reaching and FMAX target on a Cyclone V device where the same design compiled on a MAX 10 and Cyclone IV E device achieves the target no problem.
In fact, the Cyclone V is an utterly horrible 53% the speed of the older Cyclone & Max 10 device. Here are the compile results for the PLL clock output:
Max 10 10M50DAF484C6GES, 116MHz, 5:30min compile time.
Cyclone IV E EP4CE30F23C6, 115MHz, 3:46min compile time.
Cyclone V 5CEFA4F23C6, 62MHz, 10:54min compile time.
Cyclone V is ~50% slower. The minimum target is supposed to be 100MHz. The lemon nets in question are between shift registers which are setup to emulate a FWFT fifo.
I've attached the .zip file 3 Quartus projects in question. The results I've listed above were obtained with Quartus Prime 20.1. Each of the following 3 folders are full projects which just need compilation to rebuild all the missing files and obtain the FMAX results I listed.
BrianHG_DDR3_DECA_GFX_DEMO - This project folder contains a functional 10M50DAF484C6GES build for Arrow DECA MAX 10 FPGA development board.
BrianHG_DDR3_CIV_GFX_FMAX_Test - This project folder contains a EP4CE30F23C6 build of the same project.
BrianHG_DDR3_CV_GFX_FMAX_Fail - This folder contains the LEMON Cyclone V 5CEFA4F23C6 build which cannot get an FMAX anywhere close to the other 2 builds.
Folders: 'BrianHG_DDR3' and 'BrianHG_DDR3_GFX_source' contain the additional shared source code required for the project to work.
Is there anything I've done wrong? Even the compile time is twice as long. The PLL output clock is a really deep in the core of the FPGA design and has no real ties to IO timing like the first 3 PLL output clocks which reach their desired 400MHz. Is there some sort of compiler setting to get the Cyclone V to at least achieve 90% of the older Cyclones if not the same performance? Or, is the Cyclone V truly a half speed FPGA?
FMAX clock requirements:
PLL CLK 0,1,2 - 400 MHz each.
PLL CLK 3 - 200 MHz.
PLL CLK 4 - 100 MHz.
Is there anything I can do?
The portion of my code in question, the middle of my 'BrianHG_DDR3_COMMANDER.sv' running on PLL CLK 4 - 100 MHz boils down to a 4:1 selector mux, ~168 bits wide, selection based on comparing 4 inputs with 4x2 compares to a stored 28 bit address easily surpasses the required 100MHz on Cyclone III, Cyclone IV, Max 10, not to mention the faster fabrics, but for some reason, it really dies on a -6 Cyclone V.
My 200MHz and 400MHz sections are nothing block shifting more than serializers, so the Cyclone V appears to be able to cope with those. But why is it such a massive downgrade from all the older FPGAs?
Even worse, if I leave the compile options on the default 'Balanced', the FMAX is only ~ 41MHz while the other Cyclones will still pass the required 100MHz. There isn't much I can do to simplify the 4:1 selection mux which feeds the next source code section of my design. Multistage pipeline would destroy the codes ability to correctly select which of the 4 inputs should be prioritized to run next.
Is there something in the Cyclone V fitter's settings which has somehow decided to massively impact the core's performance?
Or, once again, should I consider the Cyclone V to be truly a half speed FPGA and scratch it from my list of potential devices I can use?