Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1690 Discussions

Hyperthreading slowdown when writing to frame buffer via DirectDraw

Has anyone found that writing to the frame buffer via locking of the primary surface (i.e., your visible desktop) is significantly slower when HT is enabled vs. not? We do image processing and divide the image by the number of processors in order to parallelize the operation and write the resultant image directly into the window using DirectDraw. With one thread using 1 HT logical processor under Windows XP, the time is about 25ms. With 2 threads on the 2 logical processors, the time is 53ms which is about 2x slower. As a workaround, what I've done is to only use as many threads as physical processors if our software configuration is set to use this mode where the computation is written directly into the frame buffer. The same scheme works fine if the output buffer memory is the host memory and not the frame buffer.
My theory is that this may be due to contention for the store buffers (only 6 of them) between the 2 logical processors. This isn't significant I guess when the memory is cached but is significant when the memory is uncached write-combining memory such as the video frame buffer. Also, access to that memory is through the slower AGP4-8x bus vs. memory bus.
Any ideas or suggestions?
0 Kudos
1 Reply
Black Belt

Welcome to the Threading Forums!

Your mention of the write-combing buffers reminds me of a previous code I'd seen where this was a problem. In that case, the code was restructured to reduce the total number of buffers needed within the code segment. This was a loop that had several independent operations, so breaking the single loop into several smaller loops was relatively straightforward and the results were pretty astounding. I'm not sure if this would be applicable in your case or not.

One caution I should point out, though, be sure that you are using the physical processors on the system. WIth one physical processor, this is no problems. However, if you have 2 physical processors with HT turned on (4 logical processors), you still run the risk that the two threads created will end up on the same physical processor and you've now got the same problem as before. One would hope that the OS would be able to better schedule the threads.

You still might consider setting thread affinity to give a better chance of running two threads on separate physical CPUs.

-- clay