I'm putting a Phase Correlation algorithm in FPGA. Two images in, window them with 2D Hamming. 2D FFT on each. Complex multiply the matrices with one being conjugated first. The magnitude taken of this result and used as a normalizing divisor for the complex multiply output. Further steps are 2D IFFT the result, get the magnitude of that and search for the peak. The 2D location of the peak indicates the amount of shift between the two images.I've been trying to use fixed point math for the cores I use. The FFTs are Block Float Point but the magnitude step's sqroot core and the divider core are Fixed Point. Also trying to trim various stages output widths so I'm not carrying all those bits around. Comparing Verilog sim runs to Matlab sim runs (using the FFT core Matlab model and trying to match the trimmed output widths in Matlab). Have good comparison (and correct operation) until I put in the sqroot and divider core in the verilog sim. Things go south at that point. Anybody out there implemented this kind of algorithm? Am I wasting my time using fixed point cores and trimming widths as I'm just going to lose too much accuracy? I hate the thought of using Floating Point cores and the resultant size increase as there is a lot of other functions in the FPGA (an entire color camera path).
Did you convert the floating point numbers from FFT into fixed point numbers?Be careful by using division and square root with fixed point numbers. e.g., 0.25 represented using 8 bits and 7 fractional bits is: 0.0100000 -> 32 and 0.57 is 0.1001001 -> 73 0.25/0.57 = 0.438596 if you use a conventional integer divider, you will obtain as result Q= 2 and R=14, so, Q represents 0.015625 not 0.438596. This calculation can be done if numerator is shifted 7 bits to left: Q= 01000000000000/01001001 = 00111000 ->56 and Q represents 0.4375, in this case. Further, a rounding circuit could improve precision. Similar consideration must be taken for square root.
parradoThanks for the reply. Yes, I implemented the scaling logic to shift the BFP FFT outputs by the exponents out so that all results are in correct relationship to each other. I am a little confused by your integer divider example. I understand the representation of fractional bits by the multiplication by 2^7. However, when you say 32/73 would return Q=2 Rem=14, I would think the divider would return Q=0 Rem = 32. I'll play around with my simulations. I also had a major conceptual error in that algorithm. The output of the final divider (normalizing the correlation value) produces numbers that range 1.0 to -1.0. The Quotient out of the divider is not the value I'm really interested in. The remainder contains the fractional info that is supplying the phase shift between the two images. I'll have to use the quotient to know when the output is exactly 1.0 or -1.0 but all other values are in the remainder. mmeyers
Yeah, is my error from a previous example.Normalization divisor is p=H2(0,0) where H2(u,v)=G(u,v)*conj(H(u,vv)). h2(x,y)=ifft(H2(u,v))/p. You must select an adequate word and fractional bits size because h2 values could reach height*widht*max(g(x,y))*max(h(x,y)). assuming your images pixels take values ranging from 0 to 1, you must increase integer part of fixed point numbers to log2(height*widht) and consider my previous explanation for division and square root. I consider you could ommit the square root step, the peak search can be achieved without it.
parradoThanks for clarification. What I'm ultimately up against is large bit growth through the path. I take 8b pixels in. Shift them down to make them signed as the fft core wants signed values. That gives me a S7.0 format (-128 to +127). I then multiply them by the window coeffs which are U0.8 format. Result is S7.8 for 16bits into the first fft core which handles the image rows. Looking at Altera's chart for amount of shift possible out of a 256 point core (11 shifts), I need a 27b wide scaled result register. I've verified that the output at this point carries 8 fractional bits compared to FFTW algorithm that Matlab uses. The second fft core then would have to be 27 bits on the input. Assuming the same worst case shift out of that, my output would need 27+11 bits for output scaling. Now I'm up to 38 bits. The complex multiply of the two images will yield 76 bits. As you can see, widths are getting crazy wide. I keep thinking I should be able to drop fractional bits at various points without losing enough accuracy to throw my final result off. I'll keep running sims with various scenarios to see if it's possible to trim widths along the way. Thanks for any thoughts you may have regarding keeping those intermediate values at reasonable widths.
As your final goal is finding a peak on FFT's product square-magnitude.e.g., FFT data output could be scaled down by discarding 16 least significant bits (height*width), so your result is 22 bits, further, complex product is 44 bits, again it could be scaled down to 23 bits (bit growth of addition) by using a rounder circuit, same could be done for square magnitude calculation. You must run several simulations in order to determine the number of bits to discard. Division operation can be replaced by a multiplier if a search of a peak (p=G(xm,ym) ) that satisfies p>=threshold_percent*G(0,0) is performed. Where threshold_percent could be 0.8, 0.9, it depends on SNR.