I am trying to use altshift_taps to in an application where I would like to reduce the number of logic resources use in my design. Note that I am experimenting with using altshift_taps with tap distances of 3 or 4 (as small as possible), clock enable, and asynchronous reset. However, after looking at the compilation fitter report for various altshift_taps (different tap distances and for different widths of input data), the number of registers used can vary significantly. Sometimes I get large savings as intended, whereas other times I get values that are equal to the number of registers I am in fact trying to replace.If each type of RAM has input registers for its inputs to make the data synchronized, then that means that each RAM that altshift_taps uses should have at least data_width+address_width number of registers (I know there are more but those two should be the dominant source), right? However, I cannot figure out why I am getting such variance in the number of registers used when I try using different types of altshift_taps. If everything else in my design stays constant, and I just change the parameters in my altshift_taps, shouldn't the number of registers used stay fairly constant? I believe it has to do with how efficiently by design is mapping to the RAMs. For example, choosing a tap distance of 3 and a data width of 23 means that I end up using only a fraction of the available ports in the RAM whereas a different combination of tap distance and data width would use more of the available ports. How can I verify if this is the case?