I'll find one of our graphics

Simon_B_3 · ‎10-29-2014

Hello

I'm coding in DirectX 11 on 4th generation Intel Core processors. I'm having a situation where I want to use the back buffer as a RWTexture in the pixel shader in order to do some dynamic blending and optimizations. It's possible to specify DXGI_USAGE_UNORDERED_ACCESS when creating the swap chain. The problem I'm having is that the back buffer texture resource is created as an RGBA_UNORM format (I'm only interested in 32 bit formats due to performance reasons). There's no way to specify TYPELESS or UINT. When I pass the UAV to the shader there seems to be a format conflict unless the RWTexture is declared as float4. This is fine for writing to the texture (each float gets converted to a 8-bit unorm component) but it won't work for reading the value (then the shader compiler think I try to read a float4 from what should be a 32-bit resource).

So I was thinking in the documentation this feature Instant Access Resources is described, where the GPU and CPU texture share the same memory footprint (i.e. the CPU resource is aliased from the GPU resource). Would it be possible to alias 2 GPU resources? Then I could create a UINT fromat texture that points to the same memory as the swap chain, create a UAV from this one and pass it down to the pixel shader and perform both read and write in the shader.

Unless anyone know of some better approach altogether (not involving copying resources into off-screen RTs)?

Thanks :)

Mitchell_L_Intel · ‎11-07-2014

I'll find one of our graphics engineers to answer this :)

~Mitch

Michael_C_Intel2 · ‎11-07-2014

Can you answer these question for us? The additional information will help us better suggest a solution:

Can we get a higher level description of what you are trying to accomplish?
Do you need to read and write at the same time (same pass)?
Does it have to be a swap chain buffer?
Do you render to the texture at all and then read/write from it?

Thanks!

Simon_B_3 · ‎11-10-2014

1.) We're developing for a mobile device so trying to optimize speed and minimize energy consumption. So we wanted to look into the possibility to do all operations straight on the back buffer. For this particular case we're using the alpha channel as a mask and wanted to be able to kill the pixels early on in the pixel shader. In general read/write directly on the back buffer seems like a good feature in other cases as well. For example in post process effects you might be able to avoid ping-ponging between multiple RTs.

2.) Yes we were thinking of reading at the beginning of the shader and then write back to the same pixel at the end, unless there would be some sort of penalty associated with this approach?

3.) We were looking into the possibility of using the swap chain buffer in order to avoid resolves.

4.) We're first writing to the back buffer in a normal pass. Then in a later pass we're drawing a second set of geometry and using the alpha alpha channel as a mask (of sorts).

Michael_C_Intel2 · ‎11-11-2014

Are you trying to implement render target read basically? If so, how are you avoiding races to the same pixel location. Pixel-sync could do this for you, are you using Pixel-Sync? If you are just trying to early out, what’s wrong with discard (i.e. modern alpha test)?

3.) We were looking into the possibility of using the swap chain buffer in order to avoid resolves.

I’m not sure what you mean by “resolve” here… there’s typically more potential pitfalls to using swap chain buffers than offscreen ones. Basically only the final output pass should go into the swap chain, everything else should ideally be done in offscreen buffers. Definitely avoid copying *out* of swap chains, as was common pre-~DX9 before “render to texture” (i.e. proper render targets). The answer here might be just use discard early in the shader.

Simon_B_3 · ‎11-12-2014

To give a little bit of background on the current implementation. We had a setup using stencil buffers before. As the information in the alpha channel were never in use I tried using blend states instead. According to GPA this gave about 30% speedup for our scene. As we're currently using the blend state to compute this mask we don't have the information inside the shader to do the discard. And in general I can definitely see the use of R/W on the same RT in a single shader, e.g. if you want to do PP on only part of the screen.

We're not currently using pixel synchronization, this particular effect is dependent on the position in screen space so I'm not sure if it's needed here; bun in general yes seems like a good feature!

We're currently doing all rendering straight to the back buffer. Most of the stuff is plain forward rendering. By "resolve" I was referring to the action of copying a surface into another. Such as copying the back buffer to off-screen RT or the other way around.

I'm not sure if I understand you correctly so I must ask, is there penalties involved in doing all renderings directly to the swap chain even in the case of simple forward rendering? Is there some performance penalties when rendering to the swap chain as opposed to a off-screen RT?

Michael_C_Intel2 · ‎11-19-2014

Hi Simon,

Sorry for not responding sooner, my schedule is rather hectic right now.

Thanks for additional info. I have some thoughts on your problem:

In regards to how DX11 guarantees write access to surfaces in shaders. R/W on the same RT in a single PS shader is not something that DX guarantees order for.

PixelSync only guarantees anything when the effect is dependent only on the position in screen space.

Do you mean that this effect only touches a single pixel once?

Is it possible to alias 2 GPU texture resources with the Intel DirectX extensions?