FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

Understanding Polyphase Scaler Taps

Altera_Forum
Honored Contributor II
4,783 Views

Hi, 

 

I am trying to understand the significance and effect of the number of taps in the polyphase filter. What does increasing the number of taps do to the scaling result? What is the difference between using a 4 tap and an 8 tap filter? 

 

If I use the default Lanczos-2 filter, the impulse response of the filter is defined over a range of [-2,2]. A 4 tap filter then basically samples this function, centered over tap 1 for phase 0. Changing to an 8-tap filter, takes the same Laczos-2 function, spreads it out to be defined over the interval [-4,4] and centered of tap 3. But this filter now has a low-pass cut-off frequency that is 1/2 of the 4 tap filter. Does spreading out the function not change the low pass cut-off frequency of the filter? So, adding more taps results in a scaler with more filtering (i.e. image blurring). 

 

A trivial example: If I set-up the scaler with a 1-1 scaling ratio, then using a 4-tap filter results in the output being exactly equal to the input. Using an 8-tap filter, the output is a slightly blurred version of the input. Is this desirable?  

 

Is there any advantage to using the Lanczos-4 function (defined over [-4,4]) without spreading in an 8-tap filter?  

 

If I have a scaler that has to scale up or down or do no scaling (SD->HD, HD->SD, HD->HD), should I have different coefficients setup for each type of scaling? 

 

Any feedback is welcome! 

Regards, 

Niki
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
3,225 Views

Hi, 

 

The VIP user guide does give some guidelines for the number of taps that should be used for various scaling ratios. For any upscale ratio bicubic or 4 tap polyphase with Lanczos 2 is recommended. For downscale it recommends polyphase with 2*lanczos_order*input_size/output_size taps. It also recommends using Lanczos 2. I think the choice between which Lanczos set to use is less critical than getting the number of taps right - I think there might be some debate about whether Lanczos 2 or Lanczos 3 is better (http://en.wikipedia.org/wiki/lanczos_resampling). 

 

You are correct that increasing the number of taps will generally increase the blur in the output. This is useful in downscale only as it acts to remove frequencies that you no longer have the resoltuion to represent (if you get the right number of taps) - think of a zone plate becoming grey when you donwscale, rather than aliasing. The Altera Scaler is not yet edge adaptive so you will get some blur in upscale as it takes the weighted sum of 4 taps, but for SD->HD this should not be too noticable. 

 

If you are changing the scaling ratio at runtime you should definitely change the coefficients you use with each change in input/output resoltuon - either using mutliple banks or having some code on something like a Nios to calculate and load new coefficients on the fly. The UDX4 (current version is UDX4.2.1) reference design uses Nios code to do just this - calculating the recommended number of taps for each scaling ratio, generating and loading the correct coefficients. You can either use this code directly if your system has a processor, or use it as a reference to generate coefficients to load into multiple banks at startup if you are using an HDL statemachine to configure things. 

 

Hope this helps. 

Regards, 

Kieron
0 Kudos
Altera_Forum
Honored Contributor II
3,225 Views

Hi Kieron, 

 

Thanks for the reply! I have read the VIP user manual recommendation, but wanted to know why and what was happening "behind the scenes". I have been playing around with an 8-tap filter and various coefficient sets and I think I understand the process better now. For downscaling, where you want the anti-aliasing filter, the Lanczos-2 coefficients seems to provide good quality. But I do not agree 100% with the way Altera (and the UDX4 software) calculates the coefficients for the 8-tap filter. 

Altera does it as follows: 

 

C(k) = lanczos2(k/2+p), where lanczos2(x)=sinc(pi*x)*sinc(pi*x/2) and k is the tap [-3..3] and p in the phase fraction [0..15/16]. 

 

The effect is a Lanczos-2 filter spread across the interval [-3..3], but which is shifted across two taps by the phase increment. See the attached figure Altera_Lanczos2_8tap.jpg for a picture of phase 0 and phase 15.  

 

As far as I understand, the phase increments should shift the coefficients between a single pair of taps (pixels). Maybe for downscaling this does not matter since the Lanczos-2 already filters image, but if you use this for up-scaling, the results are terrible. See the attached file Altera_upscale_result.jpg. Note the extreme "blockiness" (also note the picture is a photo taken of the LCD screen with a camera - hence noisy background as you see the individual pixels on the screen). 

 

I have compiled my own set of coefficients with the following function: 

 

C(k) = lanczos2((k+p)/2) 

 

See the attached file Custom_Lanczos2_8tap.jpg. The phase is shifted by one tap and not by two taps. For downscaling I cannot see a difference between my coefficients and the Altera coefficients. For upscaling, my coefficients produce a much better result (filtered, as expected). See Custom_upscale_result.jpg. 

 

For equal and up-scaling I am using a Lanczos-4 8-tap coefficients (since I already have an 8-tap filter, might as well use it). See Lanczos4_8tap.jpg. This has a phase 0 coefficient set with only one non-zero tap, so if the input and output resolution are the same, there is no change to the video and the up-scaled video is sharper than with the 8-tap Lanczos2. I do not have bicubic coefficients (does anybody maybe have a set of bicubic coefficients I could try?). Would you expect a bicubic filter to give better results than the Lanczos-4, 8tap filter? 

 

Currently my system automatically switches between the two sets of coefficients (My custom Lanczos2 for downscaling, Lanczos4 for equal and upscaling).  

 

Sorry for the long posting! 

Regards, 

Niki
0 Kudos
Altera_Forum
Honored Contributor II
3,225 Views

Hi Niki, 

 

Which version of the Scaler were you using for this work? If it was Scaler I or Scaler II 10.1 (or Scaler II 11.0 with old software code) then I might have an idea why the Altera coefficients looked so horrible for the 8-tap upscale, and why phase 15 was shifted by 2 taps instead of 1 tap. It comes down to how the Scaler I and 10.1 Scaler II calculate the required phase for each output pixel for a given scaling ratio. For these versions the equation (for hozontal phase) is: 

 

phase = (((output_pixel_index * input_width) % output_width) *total_phases)/ max(input_width,output_width) 

 

The integer arithmetic used in the implementation adds an implicit floor function to the result. The numerator will vary in the range 0 -> (output_width - 1)*total_phases. If output width is greater than input width the final result will range between 0 and total_phases-1, as one would expect. If however the input_width is larger then the range will be reduced - e.g. if we are downscaling by a factor of 2x we will only use half the available phases. The larger the downscale, the smaller the usable phase range becomes. 

 

Because Altera was aware of this issue (it is my understanding that it was a deliberate optimisation to work around the HLS tool used to write the Scaler I rather than a bug) the coefficient generation code they provide compensates for the reduced phase range. However, it assumes that the number of taps you are using is indicative of the scaling ratio i.e. if you use 8-taps for Lanczos 2 is assumes you are doing a 2x downscale. In this case it will compress the useful phases (up to a shift of one tap) into the first half of the phase range. It will generate values for the other phases too, but these will have shifts greater than one tap, as you saw. This is fine if you acutally do a 2x downscale - as your results showed - you just get a bit less granularity in the phase shift. However, if you do an upscale the phase equation will be able to give results beyond the first half of the phase range, and you will get all sorts of horrible phase shifts and your output will look horrible. 

 

The first release of the Scaler II was a quick port of the existing algorithm to HDL from the internal HLS language and so it carried this optimisation over. However, for the 11.0 release there was time to look at this issue and fix it. If you check the 11.0 user guide it should give the following equation for phase for the Scaler II: 

 

phase = (((output_pixel_index * input_width) % output_width) *total_phases)/ output_width 

 

Now the full phase range is always accessible, no matter the scaling ratio. The coefficient generation code has also been altered to take this into account, so (hopefully) the really horrible upscale you saw for 8-taps Lanczos 2 should not happen now. 

 

Sorry for the long reply. Hopefully this was the cause of the strange things you saw (if you were using Scaler II 11.0 and the latest software code then I don't know what caused it). 

Regards, 

Kieron
0 Kudos
Altera_Forum
Honored Contributor II
3,225 Views

Hi Kieron, 

 

I was using Scaler II from Quartus 10.1 and since Scaler II has no coefficient generation option, I used Scaler I just to generate the coefficients. I correlated these with the equation I found in the UDX4 example code (which was probably also based on Quartus 10.1 Scaler II operation). What you say makes sense - it is a pitty that none of this is properly documented / revealed by Altera. Some of this information could have saved me quite a number of hours of debugging. On the up-side, I understand the coefficient generation process much better now ;-) 

 

I have yet to download version 11, and maybe now is a good time. 

 

I was wondering, how does Altera generate these cores? Do they use the Matlab->HDL tools, or an existing HDL (Verilog/VHDL/SystemC) or some proprietary language? 

 

Regards, 

Niki
0 Kudos
Altera_Forum
Honored Contributor II
3,225 Views

Hi Niki, 

 

I think the lack of detail in the docs is a recurring theme for the VIP cores and I hear there may be some effort over the next two or three releases to try and improve on this. 

 

With regards the design flow for the VIP cores, as I understand it there are 3 'classes' of core. Most of the cores were written using an internal high level synthesis language called CUSP. I think the language is similar to C / System C and the compiler still ships as part of Quartus and runs as part of the Analysis and Synthesis stage. This is the first class of ip core. I think this tool has been deprecated for a while now and so the CUSP cores are being converted to a new component based HDL (Verilog) approach.  

 

The second class of IP core is those that have already been converted to the HDL component approach - currently only the Scaler II and Deinterlacer II. Over the next few years all of the current cores will be converted to this approach and a 'II' version of each will appear when it is ready. Eventually the old CUSP cores will be deprecated completely, but the 'I' and 'II' versions of each will coe-exist for a while, as with the Scaler I and Scaler II at the moment. 

 

The third and final class of IP core uses HDL (Verilog), but does not use the component based approach. These cores are the CVI, CVO, Packet Reader and Control Sync. The CVI and CVO will probably not be converted to the component based approach, but the frame reader and control sync probably will so there will eventually be a 'II' version of them too. It is my understanding that the packet reader and control sync were written quickly after CUSP was deprecated by before the component based approach was ready to come out of the oven, just to fill some gaps that users were requesting. 

 

The new component based approach is (I think) quite interesting - The VIP cores are composed of smaller cores internally that operate on a line by line basis, rather than frame by frame. Internally the frame packets are broken into smaller packets that are just one line each, allowing components to be shared or time division multiplexed in full system designs. At the moment the base components are hidden, but I think the plan is to eventually offer an 'advanced user' license that will make them visible to users to build systems in Qsys. There is a 4K upscale reference design available (or available soon) that uses these components if you are interested in more info on this. 

 

Hope this helps. 

Regards, 

Kieron
0 Kudos
Reply