Re: Scaler II vs Scaler problems

Altera_Forum · ‎02-15-2011

Hi,

I have a working video processing design that includes the Scaler (v 9.1sp2). In the design I need to convert from 4:2:2 to 4:4:4, scale the video, and then go back to 4:2:2. With the Scaler II in Quartus 10.1, as I unerstand, I can scale the 4:2:2 video directly. I have replaced the Scaler with Scaler II, took away the chrome resamplers and enabled the 4:2:2 option in Scaler II. However, now the design stops working and I get FIFO overflow in my CVI.

I have set both scalers to scale 576p50 to 720p50, using bi-linear method. I have also noted (and compensated for) the change in register map for Scaler II. I have increased the CVI FIFO to over 20000 pixels, but still the same result. If I put the original scaler back (with the chroma resamplers) everything works again.

Does anyody have any experience with the Scaler II using 4:2:2 mode? Why would the Scaler II cause the FIFO overruns?

I also do not know what the option "No blanking in video" means. The description says that you should turn it on if there is no vertical blanking in the video. Does this refer to the video input to the CVI or the output of the CVI? The output of the CVI never contains vertical blanking since it is stripped by the CVI. And why would the scaler be concerned with whether the input video contains vertical blanking or not - except that it means there is a "pause" between valid video frames?

Lastly, the documentation seems to indicate that in some (all?) cases you need to load the coefficients (the note below table 3-21 in the VIP 10.1 UG). Is this required for bi-linear method or only for polyphase?

Regards,

Niki

Altera_Forum · ‎02-15-2011

Hi Niki,

If the Scaler works in a given design (even with chroma resamplers either side) then it should be possible to just replace the old Scaler (and resamplers) with the Scaler II. Can you give some more information about your datapath and the Scaler II parameter set you are using as I can't see what the problem might be? The loading of coefficients refers only to the polyphase mode and the bilinear mode does not require the user to load coefficients.

With regards the "No blanking in video" option, this is perhaps not as well explained in the docs as it could be. This does refer to the input to the CVI - i.e. you should only turn it on when the input to the CVI has no blanking. When input video has blanking the CVI will have chunks of time where there is no valid output (assuming the input to the next block is ready most of the time). At the end of the input frame the Scaler II will still have a number of output lines left to produce as the lines stored in the internal line buffer flush through. As the last lines of an input frame flush through the line buffer the Scaler II has two choices for what it can do at its input.

1. As the lines of the previous frame flush through the buffer it can wait for lines of the next frame to replace the old lines, thus always keeping the line buffer full with valid data, but only moving data through the buffer and generating valid output when new data is available.

2. Keep the input ready low until the flush has finished, replacing all the input lines with blank data and continuing to generate output data at 'full speed'.

Option 1 is useful if there is no blanking in the input video as there will be no pauses in the valid stream of video reaching the Scaler II. We can wait for new data and consume it immediately without stalling the output. However, the first option will cause underflow at the output if there is blanking (pauses in valid data) at the input between frames as no output data will be produced while we wait for new input data.

Option 2 is sufficient if there is vertical blanking in the input video as this causes a large pause (~40 lines worth) in the input video where there is no valid. Hence dropping the ready while the lines in the buffer flush is not an issue as it will not backpressure anything.

Ideally we might want to use a hybrid third option where we can consume input data from the new frame if and when it arrives, but without ever waiting for it to arrive and stalling the output. Unfortunately the way that the Scaler II is implemented would make this difficult and costly to do. Hence the two options are included separately for the user to choose between at compile time. Indcidentally, the old Scaler used option 2 all the time.

Sorry for the long (and probably poorly worded answer).

Regards,

Kieron

Altera_Forum · ‎02-15-2011

Hi Kieron,

Thanks for the thorough explanation of the blanking option! Some clarification in the documentation would indeed be welcome. I have still not been able to solve my problem. My datapath is as follows:

CVI -> DeInterlacer(MA) -> CRS -> Scaler -> CRS -> VideoOut

(The system is much more complex, but I have taken out all the non-essential blocks to debug this). VideoOut is a custom IP block that does the same as CVO, but includes a tripple frame FIFO buffer and a few other enhancements I required.

The above data path works fine with the original scaler. I have tried two versions with the Scaler II:

CVI -> DI -> Scaler II (422 mode) -> VideoOut

and

CVI -> DI -> CRS -> Scaler II (444 mode) -> CRS -> VideoOut

In both cases I get CVI FIFO overflows and almost no video output. I have set the options of the Scaler II to exactly the same options as I have for the Scaler. The options are as follows:

Bits per symbol: 8

Symbols in parallel: 2/3

Symbols in sequnece: 1

Enable Run Time control of input/output frame size

Maximum Input Frame Width: 1920

Maximum Input Frame Height: 1080

Maximum Output Frame width: 1920

Maximum Output Frame Height: 1080

4:2:2 data (in one case selected, in the other not)

Algorithm: Bilinear

I have not added any extra pipelining registers.

I need the 1920x1080 in the final system, but for testing now the input stream is 576p50 and the output is set to 1280 x 720. I can also set the output to 720 x 576 (in other words - no scaling required) and in both cases I get the same problem with Scaler II. The original scaler works 100% in all cases.

The only non-Altera block in this chain is my custom VideoOut block, but the system has been working flawlessly for a while now using the original scaler, so I have a fair amount of confidence in my custom VideoOut block. Also, I can put in a mode where it ignores all incoming data and applies no backpressure (it always asserts ready) and even then I see the CVI FIFO overflow.

My next step will be to bring out the ST input and output interface flow control signals of the Scaler to debug pins so that I can see what is going on.

BTW, I have Quartus 10.1, build 197 with SP1 (Full Version).

Regards,

Niki

Altera_Forum · ‎02-16-2011

Hi Niki,

Everything you have described about your system looks normal so it is a bit worrying that it does not work. I think that checking the output of the Scaler II is the way to go (using SignalTap?). Check the values of the control packets to make sure the Scaler II is getting the correct resolution, and look for any large gaps in the Avalon-ST valid.

What are you using to update the control information for the VIP cores (through the Av-MM slave ports)? This may be a completely stupid question but it is always good to rule these things out (just in case), but when you update the control information for the Scaler II through the Av-MM slave are you definitely only writing to addresses 0, 3 and 4? The only reason I ask is that the old Scaler may have been more sympathetic to writes to coefficient registers that do not exist in bilinear mode, while the Scaler II has a very minimalist control slave that will probably do very bad things if you write to any registers that are beyond address 4. Sorry if that insults your intelligence, but it is always good to rule these things out.

If the SignalTap does not yeild any answers and you are using SOPC Builder could you attach the .sopc file for the simplified version of your design (if you are comfortable with doing this) and I will try to simulate the error.

Regards,

Kieron

Altera_Forum · ‎02-17-2011

Hi Kieron,

You are not at all insulting my intelligence! The art of debugging is the the systematic ellimination of possibilities and it is often the small, trivial, issues that get overlooked while focussing on the main problem! (Like, is the power switched on!). Anyway, I have found the problem and it is indeed related to the Avalon Control interface.

While looking at the ST control packets on either side, I noticed that the scaler was always outputting control packets with the resolution set to 1920x1080 even through I had set the output resolution to 1280x720. It was ignoring the control port settings even though I had verified the register addresses. (The CVI FIFO overran because my system cannot handle 1920x1080 at 50/60 fps). I started looking at the actual VHDL file generated by SOPC Builder and noticed that the Scaler II has a 32-bit data interface. My system does not include a NIOS and my custom Avalon Master component has a 16-bit data bus. All of the VIP cores I am using have 16-bit data bus interfaces. I cannot remember what SOPC builder does in such a case (16-master, 32-bit slave. I'll have to read it up from the manual again), but it seems as if this does not work for Scaler II. Interestingly, I went back to my original design with Scaler I and I noticed that it also has a 32-bit data interface, but in this case the core seems happy with writing only to the lower 16 bits of each register.

I have made a quick hack to force 32-bit data writes to the scaler and now it works as it should! I will change my Avalon master to 32-data width since this seems to be the safest route.

As a final note, the VIP UG does not mention this. Neither the Scaler nor Scaler II register maps (chapter 7) mentions that the data width is 32-bits. In chapter 4, it is mentioned under Avalon-MM Slave Interfaces that the control registers width varies between cores and it then refers the reader to chapter 7. So a note there would be useful (if you have any control over that ;-)) . Most people probably use these cores in NIOS systems with 32-bit Avalon masters, but that is not always true.

Thanks again for your interest and help!

Regards,

Niki

Altera_Forum · ‎02-17-2011

Hi Niki,

I'm glad you have found the problem and that the Scaler II is not broken. The reason it initially worked for the Scaler and not the Scaler II is that the old Scaler uses native addressing and the Scaler II uses dynamic addressing.

With native addressing each word written by the master writes to one word of the slave, regardless if their data widths. This leads to cropping or padding of data when the widths don't match. In your case you will only ever write to the bottom 16 bits of the Scaler's control registers, but this is OK because the resolutions you were wrting were less than 16 bits wide.

With dynamic addressing byte enables are used to supliment the conversion between the master and slave addresses so that every byte of data written at the master is written to the slave, and every btye read by the master is useful data from the slave. Hence, when a 16 bit master wirtes to a 32 bit slave, the bottom two bits of the master address are converted into byte enable signals at the slave

- address 0 at the master will write to the bottom two bytes of register 0 at the slave

- address 2 at the master will write to the top two bytes of register 0 at the slave

....and so on. This means that your writes to byte adresses 6 and 8 (to update output resolutions) will have been written to the top half of register 1 and the bottom half of register 2 in the Scaler II slave (instead of registers 3 and 4).

The change to dynamic adressing in the Scaler II has been driven by the deprication of native adressing in SOPC Builder. The downside is that, with native adressing, the user never needs to know the width of the slave registers, so long as their master is wide enough to convey the desired data. Hence there was no need to explicitly state it in the user guide (I'm guessing). With dynamic addressing you do need to know the width of the slave to work out where all your bytes of data will end up. A note should definitely be added to the user guide to make all this explicit. Perhaps this will happen.

Good luck with the rest of your design.

Regards,

Kieron

Altera_Forum · ‎02-17-2011

Hi Kieron,

This makes sense! Thanks again for the thorough explanation!

Regards,

Niki

Altera_Forum · ‎02-24-2011

Hi Kieron,

Sorry to bother you again, but I have question regarding dynamic addressing. If my system contains a mixture of 32 and 16 bit slaves, and I have a 32-bit master, then if I write to a 16-bit slave, I will always write to two 16-bit registers as I understand it. If the slave has 4 registers starting at 0x0000, then writing to 0x0000 would write to slave register 0 and 1, and writing to address 0x0004 would write to slave register 2 and 3? Would writing to 0x0002 write to slave register 1 and 2? Would it be possible to write to a singel 16-bit slave register?

The problem is that I may not want to write to a particalar 16-bit register (it may adversely affect the operation of the slave) and it seems as if it is not possible to single out specific 16-bit registers on the slave.

Regards,

Niki

Altera_Forum · ‎02-24-2011

Hi Niki,

I think you will need to add a Byte Enable output port to your 32 bits master in order to write to 16 bit slaves as you require. For example:

To write to register 0 in your 16 bit slave you would write to address 0 with the bottom two bits of the byte enable turned on.

To write to register 1 in your 16 bit slave you would write to address 0 with the top two bits of the byte enable turned on.

To write to register 2 in your 16 bit slave you would write to address 4 with the bottom two bits of the byte enable turned on.

And so on. I would suggest that you always align the addresses that your master drives out to the number of bytes in the master word (i.e. use mutliples of 4 in your case), and set the byte enables relative to this. I haven't read the Avalon-MM docs in enough detail recently enough to know what will happen if, for instance, you try to write to address 2. It is possible that SOPC Builder will be very clever and write to register 1 for the bottom two byte enables, and register 2 for the top two byte enables. However, it is also possible that it would break, so unless you really need to write to addresses that are not aligned to the master word (and you don't mind trawling through the docs to check if this is allowed) I would suggest playing it safe.

Also, be careful if you are mixing dynamic and native slaves in the same system. All of the VIP cores apart from the Scaler II still use the old native addressing, and I am pretty sure that it is how the slave is delcared that defines how the master and slave communicate. For the native slaves I think that every write by the master will always write to one slave register, regardless of the relative widths and any byte enable signals you drive (but maybe drive all the byte enables high when writing to one of these cores, just in case).

Hope this helps,

Kieron

Altera_Forum · ‎02-24-2011

Hi,

I can see that the dynamic addressing makes sense for "memory-mapped" peripherals where you have an area that you want to always have as a contiguous block of addresses, but for simple register-based peripherals it is a bit of a pain. The master now has to be aware of the width of the slave it is accessing in order to set the byte enables correctly.

What I might do instead is to write a simple 32-bit to 16-bit bridge and place all my 16-bit peripherals on the other side of the bridge. All registers are still alligned to the master width (32-bits) but the master does not need to set any byte enables. The bridge divides each 32-bit address by 2 for accessing going to the 16-bit side.

Thanks again!

Niki

Altera_Forum · ‎03-07-2011

Hi,

I have a similar problem with Scaler II. Maybe you can help me also.

My system contains the next chain:

CVI -> clip -> scaler -> triple frame buffer -> scaler -> clip

It ends with a custom-made conversion modul from Avalon ST to our internal video format. The whole system is built up in block diagram editor with self-made verilog controller modules.

The double clipper/scaler combination is necessary in this system because the output resolution is changed during runtime and a proper synchronization must be set up to other external sources. Without this bandwidth problems would appear.

This systems works well at both 1920x1080p60 and 1920x1200p60 input resolution.

I changed the second scaler to the scaler II. The controller modul was changed to the new type of Avalon MM interface and new register set.

With this setup I got black screen instead of the video. I have made experiences with modifying the scaler parameters and the controller modules but nothing happend. I have no idea what's going on.

I have another problem probably with the scaler I also. The described system above with the scaler I cut several pixels at the right and bottom side of the video when it is scaled down. The number of cut down pixels is higher when the output video is smaller. I am talking only about several pixels - we noticed it only after monthes of usage because the effect is so small but it is there.

The strange thing is the output video has exactly the specified resolution which is set by the application SW. There should be somewhere a scaling problem. This is a mystery for me.

Does anybody have similar experience?

Best regards,

Istvan

Altera_Forum · ‎03-08-2011

Hi Istvan,

when you say you are using the block diagram editor, does that mean you are using the Megawizard to generate the IP cores? What is the parameter set that you are using for the Scaler II?

Also, when you say that you are using the 'new type of Avalon MM interface', does this mean that you have just changed the register map to match the Scaler II, or are you trying to use the dynamic addressing described in previous posts? Whether the slave uses dynamic or native addressing is only relevant if you connect IP cores using SOPC Builder. If you use the schematic flow to connect blocks then the slave will effectively always use native addressing.

Best regards,

Kieron

Altera_Forum · ‎03-08-2011

Hi Kieron,

Yes, I use the Megawizard.

The Scaler II is generated with 3x8bit parallel video data input. Runtime controlling option ha already been tried with both disabled and enabled state without any change in the result. Resolution is 1920x1200. 4:2:2 option is unchecked (4:4:4 input) and "No blanking in video" option has also been tried both checked and unchecked.

Scaler algorithm is polyphase with shared hor/ver coefficients. Default settings are left unchanged for the precision.

The older Scaler did not have byte enable pins while the new one has them. I modified my Scaler controller modul corresponding to the standard found in the Altera Avalon Interface Specification (p. 3.23). The new register set was also included into the Scaler II controller, of course but I use only the 0, 1, 3 and 4 registers for changing the output width and height.

Best regards,

Istvan

Altera_Forum · ‎03-08-2011

Hi Istvan,

Please correct me if I have misundertood, but you say that you are only using registers 0, 1, 3 and 4 when you update the control interface. Does this mean that you are not loading any coefficients into the Scaler II? The Scaler I supported a mode with constant compile time loaded coefficients. The user could select the coefficient set to use in the setup GUI and these would be loaded when the design is programmed to the device. However, the Scaler II currently does not support compile time coefficients (it is my underatanding that this feature should be supported in 11.0, but no guarantee) and so you must create and load coefficients at runtime through the control slave interface. If you don't load any coefficients the default contents of the coefficient memory will be used - probably all zeros - and that could explain why your output is all black.

Best regards,

Kieron

Altera_Forum · ‎03-08-2011

Hi Kieron,

No, I am not loading any coefficient into the Scaler II. This really can be the reason.

What about the other phenomena with the Scaler I? Is that unit fully accurate? Maybe it is also an abuse of the component. This problem was the first one which forced me to start experience with the Scaler II.

Why was the Scaler II developed after the Scaler I?

With many thanks,

Istvan

Altera_Forum · ‎03-08-2011

Hi Istvan,

The 'missing' pixels you talk about are most likely an expected effect caused by downscale (you should not see any such effect for upscale or pass through). What are the input/output resolutions where you notice it the most? Basically, when the scaler downscales a line by a factor of N it starts at pixel 0 in the line and uses every Nth pixel as the center of point for the lanczos kernel to produce each output pixel. Hence, the final output pixel in each line was created from a kernel centered N-1 pixels from the edge of the input frame, so it might appear as if you lose a pixel or two for larger downscales. A similar affect occurs when downscaling the number of lines. The Scaler I and Scaler II use exactly the same algorithm so you will not currently see any improvement in the Scaler II.

While this effect is expected of the algorithm used, it might be greater than it should be due to a potential error in the coefficients generated by the Megawizard. When you run the Megawizard for the Scaler I you can preview the coefficients it is going to generate. If you select, for example, 9 taps and look at the coefficients you will see that the center (highest value) for the kernel in phase 0 is not centered on tap 4, which should be the center tap, but is somewhere between 2 and 3. In fact, all the phases are centered slightly to the left of where the probably should be, meaning your whole image could be shifted up and left by one or two pixels more than it should. It is my understanding that this issue has been picked up at Altera and should be resolved in the Scaler II in the 11.0 release.

I think the reason the Scaler II was created was because it uses a new line-based approach internally that will eventually allow some of the internal components to be exposed to users to generate more flexible systems than the current frame-based approach allows.

Hope this helps.

Regards,

Kieron

Altera_Forum · ‎03-09-2011

Hi Kieron,

Input resolution can be 1920x1080 or 1920x1200. I found the effect only at downscaling as you mentioned.

I have a test pattern with several pixel wide white border on the for edges. It is fine on top and left but completly disappears on the right and bottom at around half size downscaling (e.g. from 1920x1080 to 960x540).

The coefficients are not centered well into the whole interpolation range. You are right.

So it seems I have to make an external coefficient loader for the scaler to try moving the center point of the interpolation.

You helped so much, many thanks!

Best regards,

Istvan

Altera_Forum · ‎03-09-2011

Hi Istvan,

If you are happy to keep using the Scaler I then you can opt to load your own custom coefficients at compile time - thus saving yourself the effort of adding extra logic to load the coefficients at runtime. On the final tab of the Scaler GUI there is an option to select the coefficient set, and th option of CUSTOM will allow you to specify a csv file with your custom coefficients. The user guide has information about how this should be formatted (I think).

Good luck with your design.

Best regards,

Kieron

Altera_Forum · ‎03-09-2011

Hello Kieron,

I have found it, that's great!

Thank you again!

Regards,

Istvan