FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

VIP, Parallel processing

Altera_Forum
Honored Contributor II
1,368 Views

We have a working design as follows: 

 

CVI -> CSC -> DEI -> FB -> Scaler -> CR -> Interlacer -> CVO 

 

This works fine for video up to 1080P input/output. The VIP is run at 160MHz.  

 

Now we are trying to get larger images down the pipe - 2560 x 1600 @ 60Hz so the pixel clock is 268.5MHz. I've tried to run the VIP at 260 - 270MHz but the part just won't run that fast (Stratix III EP3SE80F780C3). The Deinterlacer will run up to about 210MHz. The Frame Buffer and Scaler seem to max out at about 245MHz. I've checked this with TimingGen and trial and error with live video. 

 

I'm tasked with trying to split the image into two pieces and run a duplicate path to process both halves and then piece them back together. I'm curious if anyone has attempted to do this or if anyone may have advice on how to do it.  

 

I think I can add a component to SOPC to monitor (for example) the output Vsyncs and add backpressure via "din_ready" to do some alignment. Our concern is since I'll have two Frame Buffers performing drop/repeat for frame rate conversion and I have to share memory between the two paths that there may become a Frame delay between the two paths. Not sure how to handle this. 

 

Any input would be appreciated.
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
240 Views

Before addressing your specific question I would offer the idea that your team carefully weigh the cost tradeoff of trying to shove a square peg into a round hole vs. changing to a properly shaped hole (like a fancy new Stratix IV FPGA). 

 

Also before immediately jumping to the idea of splitting the video into two seperate paths, you may want to put more time into getting the design to meet your timing requirements via design partitioning, incremental compilation, logiclocking, etc.). 

 

At the end of the day it's all money so you guys just need to decide how/where you want to spend the time and money. 

 

Now, let's assume you've settled on splitting the video into two paths. You can add run-time control to the frame buffers to help control when they change between frames. 

You could also write your own frame buffer that handles two streams and locks them together (that would be my choice). 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
240 Views

Thanks for your input Jake.  

 

Basically the hardware is in the field so we have to make do with what we have. We did some evaluations early on and it seemed like we had things working but then the design changed, features were added, ... the usual. 

 

I sat with our FAE and we tweaked the tools to get the timing as good as we could. Although I'll certainly revisit this again. He's coming in tomorrow to help us get 10.0 up and running. We're also hoping the new tools may get us over this hump. Also - the User Guide lists some Fmax numbers for the various IP - we're exceeding those numbers by quite a bit on several of the IP. 

 

But if that doesn't get us there then I'll probably have to go with your suggestion on the Frame Buffer. Though I'm not looking forward to that. 

 

Thanks again.
0 Kudos
Altera_Forum
Honored Contributor II
240 Views

Not sure if this would help at all, but I have heard from Altera that there will be a (very) BETA version of a new VIP Scaler in the UDX4.0 reference design that can be requested through your FAE. I don't know exactly when it will be made available, you can at least request it now. 

 

I have a version of the new scaler and gave it a quick go through Quartus II 10.0 on the device you specified (on its own, not in full system) and it comes in just under 300Mhz. It is possible it could meet your frequency requirement in the full system and maybe allow you to rejoin your video stream earlier (scaling split images is annoying as you have to get the overlaps correct for each scaling ratio or the seam will look weird). If the chroma resampler, CSC and interlacer all run sufficiently quickly you could maybe get away with splitting the image with duplicators/clippers before the deinterlacer and rejoining with a mixer immediately after. You would still need your own frame buffer to run at the required speed, but you might not need to worry about syncing two halves of a frame through drop and repeat. 

 

As a side note, the new scaler also supports 4:2:2 so if you were tight on memory bandwidth it could allow you to chroma resample earlier in the design and deinterlace, scale and frame buffer in 4:2:2.  

 

Kieron
0 Kudos
Altera_Forum
Honored Contributor II
240 Views

Thank you for your help Kieron. We'll see where I end up after I get 10.0 installed.

0 Kudos
Reply