I'm designing an accelerator for image processing and use lots of DSPs and run into problems regarding the automatic register packing. I have a control module running at 125 MHz which acts as a register map accessible by an Avalon interconnect, and this module outputs registered control signals (image width, image height, parameters, ...) to a higher speed processing core (running at 200 MHz). There is a single-bit signal called "req" which is set to '1' and synchronized from 125 MHz to 200 MHz with a three-register synchronizer. The processing core waits for a rising edge on this "req" signal and latches the control signals to its own registers - after that the processing core uses these signals to run the operation. A synchronized "ack" signal is used to signalize that the processing core has finished processing.
All control signals are constraint as false paths, and combined with the three-register synchronizer delay the design works perfectly fine. My current problem is that I'm directly using some control signals in arithmetic operations which are mapped to DSPs. Unfortunately Quartus doesn't want to pack pipelining registers into the DSPs, even when implementing two further pipeline registers explicitely.
Are there any best practices / examples regarding a correct synchronization of multi-bit data for Intel/Altera FPGAs? I was looking into some papers (see http://webee.technion.ac.il/~ran/papers/Sync_Errors_Feb03.pdf) and I kinda think that my synchronizer looks a bit like the Push synchronizer logic (see Figure 3).
To better understand the problem that you are facing, may I know what devices you are using, what is the DSP operation mode and what control signals that you try to access from the DSP block? It will be great if you can attach a simple design here which can show the problem where the Quartus doesn't pack pipelining registers into DSP block.