The sample_decode example has a decode-only mode which can help estimate the performance of decode alone. However, as you've noticed, render and decode to file are there to show that decode works but are not optimized.
You can see a faster decode->render example with ocl_media_sdk_interop (in the samples package) or in the multi-media samples https://software.intel.com/sites/landingpage/mmsf/documentation/windows_main.html
A simpler place to start for low latency decode is the tutorials (simple_6_transcode_opaque_lowlatency). However, in the past rendering has been considered out of scope of Media SDK. After decode the surface can be rendered using the API it was allocated with (directx9/11 for Windows, VAAPI for Linux).
As far as I know there is no simple fast/low latency decode-render example at the Media SDK level. However, there may be other options. Let me see what I can find for you.