Issue with performance loss in long code

sarangclaron · ‎08-13-2010

I think I have encountered an issue similar to what Chuck DeSylva found (as described in Intel Visual Adrenalin, Issue No. 7, 2010, Page 10, last para):

According to deSylva, Cryptic wanted Star Trek Online to
be able to run at 25 frames per second, which is typical for
MMOs. Cryptic also wanted to support an ber Shader.

They didnt want to have to load multiple shaders for
different materials, so they had one huge shader, which they
ifd out in sections for different lighting scenarios. But the
shader wasnt getting compiled properly. I was able to use
various experiments to determine that it was stalling out on
the back end of our GPU pipeline. When we turned off the
ber Shader, we basically doubled the performance. However,
we found that the ber Shader was running fine; it was
actually a problem in our compilation. So it was kind of an
interesting case where we helped them to help us. It was a
good engagement from that standpoint.

Can someone at Intel please give me Chuck DeSylva's contact so I can find out how they resolved this issue?

I have a code that is very long and we use ifdefs to conditionally compile only parts of it within a block of if-else statements. What I have observed is that the performance suddenly jumps 40% when I comment out some part of this code permenantly. This is weird as the compiler's optimization seems to not work when the code becomes longer.

I would really appreciate if someone from Intel C++ compiler team can help me out here.

Thanks!

TimP · ‎08-13-2010

I don't have the organizational qualification you requested, but likely possibilities include exceeding one or more of the default quotas which may be increased by /Qinline- options (at your own risk).
I can't tell whether you mean to imply you are programming for GPU (not a released compiler which would be topical here).

jimdempseyatthecove · ‎08-13-2010

>>What I have observed is that the performance suddenly jumps 40% when I comment out some part of this code permenantly.

Do you mean

#if 0
// comment out perminantly
...
#endif

As opposed to

#if defined(MacroNotDefinedHere)
// comment outconditionally
...
#endif

Or do you mean cut the code out of the source file?

Also, you may be taxing your register usage and/or inefficiently using your L2/L3 cache system.

Register usage might be improved reworking loop nest order and/or porting code out of line (make function call). Inlining does not always improve performance, in particular where the inlined code (of high activity loop)does not fit within the Instruction Cache where the non-inlined code will fit in the Instruction Cache.

The inefficiently using your L2/L3 cache system may be addressed by the strategy of how you pass the data through your shader filters:

method 1:

grab all of data
filter-1 on all data
filter-2 on all data
...
filter-n on all data
end grab all data

method-2:

for(slice = 0; slice .lt. nSlices; ++slice)
grabsliceof data
filter-1 onslice of data
filter-2 onslice of data
...
filter-n onslice ofdata
end grabslice ofdata
end for

And for either/both methods, are you using multi-threaded programming (OpenMP, Cilk++, TBB, other)?
If so, how are youparallelizing the work: intra filter or inter filter?
If inter filter, have you explored parallel_pipeline techniques?

Jim Dempsey