Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
486 Discussions

glMapBuffer: reading mapped memory is very slow

andrasjpeg
Beginner
790 Views
Reading from a mapped buffer is significantly (~40x) slower than reading from system memory. This does not include the mapping call itself (which might block), only the actual memory read instructions. I use movdqa SSE instructions to read aligned 16 byte chunks, and used RDTSC to measure the speed. Running the same code, but with a different source pointer (one that points to system memory), the code runs much faster. This would make sense, if I used a discrete GPU with its own memory, in which case I'd have to go through the bus to read it. But I'm using Intel HD 5000, which I thought simply uses a reserved chunk of the system RAM. Is this normal behavior? Any suggestions for reading back data faster? We already use double buffered PBOs (in fact, we have a ring of PBOs) and asynchronous ReadPixels and read the mapped data in a separate thread. But as I said, driver blocking is not the issue here.. I'm using an Intel NUC with HD 5000 GPU, Win8.1 x64, driver 10.18.10.3960. Any help or advice would be appreciated! Andras
0 Kudos
1 Reply
andrasjpeg
Beginner
790 Views
Ok, I have found the solution. It turns out that GPU memory is mapped as USWC (Uncacheable Speculative Write Combining) memory, and to read from such memory fast, you have to use the MOVNTDQA instruction! It's incredible, but simply replacing MOVDQA with MOVNTDQA can increase load performance tenfold or more! Here's a good reference on the subject: https://software.intel.com/en-us/articles/increasing-memory-throughput-with-intel-streaming-simd-extensions-4-intel-sse4-streaming-load/
0 Kudos
Reply