Using a flow graph for a game loop yields uneven frame rate
I seem to have a few performance issues with the TBB flow graph.
At present I'm using a flow graph for the game loop in my game engine. On each frame I run the graph to completion and wait for it to return.
When it returns I then accumulate the windows messages and render the DirectX command lists that it generated. DirectX still doesn't support full threaded rendering so command list submission must happen on the main thread. The simplest way to achieve that is to wait for the flow graph to return and then do it. This also means I don't need to lock in any of my rendering tasks.
The graph is used as a dependency graph for my engines tasks.
Tasks are implemented similar to the tbb animation example - each task is actually a task set, which is set to run n instances of some function it's provided with at a specified grain size.
Therefore each task in the flow graph actually spawns n sub tasks.
At present I have a simple simulation of a few animated balls with a few lights in the scene. The camera is animated to move through the scene.
In terms of task set size - each task set is set to run a single instance with a grain size of 1 - (equivalent to running just a single task).
This is where my problem lies - the simulation output is not smooth. Currently my game loops runs as fast as possible - so I know to expect some variation in the frame rate but as my work load is constant I'd expect the framerate to be quite constant. However every 1-3 seconds or so, it seems to "stick" or stammer/jerk. There is an unreasonable pause that is very noticable and is constantly happening at seemingly random intervals.
I've been through and profiled each of my tasks along with the running of the entire flow graph.
The tasks run with a relatively constant time step whereas the running of the full graph fluctuates wildly. There's is definitely a baseline frame time that happens most of the time when running the flow graph that seems to be the general case, but whenever one of these jerks happen, the time to complete that frame jumps dramatically.
In Excel it's quite obvious - there's a baseline time that most frames take but a scattering of other frame times above this - which I assume is each time a jerk happens.
As I've profiled each of the tasks - the only thing I can put it down to is the TBB scheduler struggling to schedule tasks every few frames. I've run a full CPU sampling profile as well and it seems that most of the work is being done inside the tbb_debug.dll, but of that time, my tasks barely account for 10% of the processing time (according to samples).
Running at a constant framerate also seems to have similar problems.
So my questions are these.
Am I using the flow graph properly?
Is there a way to tell TBB that I'm going to run it multiple times so that it can prepare in some way to reduce this problem?
Is there a better way to achieve the behaviour I want?
Is there a way to overcome this problem and smooth things out???
After initialisation the dependency graph does NOT change - however the number of instances spawned by each task set may.
I'd appreciate any help or advice with this problem.
I thought I'd give a quick update - I've managed to track my stuttering issue down to be a VSync problem. For the uninitiated - its a conflict between my application trying to update faster than monitor can update itself leading to problems where the monitor could display the same image multiple frames in a row - which is what seems to be causing the stammering.
However I'm still interested to get peoples opinions my use of the flow graph and whether there is a way to "warn" the scheduler that the graph will be used multiple times.
Frames will be rendered one frame behind that being generated (latency) but this permits threads other than the master thread to begin generating the next frame while the master thread is performing its render function, then resuming on with its flow function.
The test (nodesProcessed .gt. threashold), which you write, is such that you are deep enough into the graph that the other threads have work to do. IOW you would not want the master thread to start with the first node and have the other threads wait for completion of rendering of last frame. Conversely, you do not want the master thread to start rendering too late. *** You may also need to code around the possibility that the master thread never satisfies the condition