I've designed a module which processes FLV stream in order to transcode the H.264 video stream.
In the search for better performance, I've started from the code found in sample_decode, and modified it to fit the needs of the original project.
When examining the performance, however, I've found out that the decoding does not outperform the original solution based on FFMpeg featuring SSE2, MMX etc.
I've spent some time analyzing the factors which might contribute to the lack of performance, but would also need advices from experienced fellow designers. As the most detrimental factor I think is the fact that the stream arrives from the network (i.e. is not abundantly available as when reading it from the file). Also, the transcoding application does not render anything - its task is to merely generate the transcoded stream which is sent to the other modules for further processing.
Also, I've run the GPA analysis toolset, and would need expert's insight into their meaning.
The details of my machine (generated by the GPA tool):
Windows 7, 64-bit DEP enabled
Num Processors: 8
System BIOS: LENOVO 8BET46WW (1.26 ) (06/22/2011)
Video BIOS: Hardware Version 0.0
Device: Intel(R) HD Graphics Family
Provider: Intel Corporation
ProductId: 126 (Intel® HD Graphics 3000)
Supports GPA Instrumentation
GPA install directory: C:\Program Files\Intel\GPA\2012 R5\
GPA version: 12.5.187105
Current user is in Administrators group: YES
Current GPA 2012 R5 (12.5.187105)
The GPA log is coming soon.
The sample applications (like sample_decode) are no optimized for performance, as thier intent is to demonstrate the use of the API. There is a new tutorial that discusses peformance operations you may find valuable: