Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7953 Discussions

Performance loss when converting statics to class members

piet_de_weer
Beginner
779 Views
A few years ago I started working on a project, for which I created (among others) one C (instead of C++) file with a large number (hundreds) of static and global variables, and functions using those variables. A few douzen are arrays of exactly 32 or 64 kB, all allocated with __declspec(align(64)), the rest is mainly base types, sometimes small (size 2) arrays of base types.

I'm now trying to convert this file to a class, mainly because I need to be able to have multiple instances of it.

What I did:
1. Converted all functions to class methods.
2. Converted all the static and global variables to class members (removing 'static').

I have overloaded the constructor of the class to make sure it's always aligned a 64 bytes, and I've checked that the 32/64 kB arrays are also still aligned at 64 bytes.

At first I got a really big performance drop (more than 13%). After checking the pointer values I discovered that my old implementation with statics caused memory to be allocated at more-or-less random locations; after converting many of the arrays were exactly 64 kB apart which of course causes caching issues. So I added some 'fillers' (0x1100 bytes each) to get rid of that. This nearly completely restored the performance.

But now I have added all the variables, I'm seeing a 4% drop in performance. This new class is only a control layer with some simple calculations, most of the work is done elsewhere in other classes (and partially by IPP).

I'm using compiler option /Qipo, due to which almost everything gets inlined into one big function, which makes it difficult to analyse what is causing the changes. (I would have to wade through a few MB's of assembly output).


4% may not seem much, but this is a real-time application, which is consuming quite a lot of processing power as it is. So I really want to get rid of the extra overhead.



Are there more things (like the different memory locations) that I should be aware of when performing this conversion?
0 Kudos
21 Replies
jimdempseyatthecove
Honored Contributor III
67 Views
Your struct(s)/class(s) member variable alignments are not only subject alignment of the struct/class itself, but also are affected by the struct/class packing requirements. These are affected by

#pragma pack(...)

and/or

/Zpn

static member variables within a struct/class are scoped within the struct/class but are global (one instance), occupy no space within the struct/class, and are aligned by compiler option/default and generally default to sizeof(intptr_t) alignment.

non-static member variables are located within each instance of the stuct/class. This struct/class may have an alignment request (your programming). The offsets to the member variables within the struct/class are affected by packing rules or alignment requests. There generally is a requirement that if a member variable within a struct/class has an alignment requirement that the struct/class itself must have an alignment requirement that is an evenmultiple of all aligned member variables declared within the struct/class.

By examining the machine code or by using the memory debug window and entering the address of the struct.member and then looking at the hex address produced, or by printing the address, you can observe what are the actual placement of the member variables. This will let you know the alignment and/or what variables might share a cache line.

In some cases, forcing alignment is counter-productive, in these casesdensepacking tends to be more productive. I wrote an article on the Intel blogs pages demonstrating this effect. In this sample program (PARSEC fluidanimate) a 30% performance advantage was gained by density packing of cache lines as opposed to cache line aligned allocations (which by the way exhibited a net gain over unaligned allocations). Each application though is different, buy careful analysis of data relationships can make for significant differences in performance (you reported 13% difference observed). The "trick" here for you is to find out the best placement scheme. Part of the Art of Programming.

Jim Dempsey
0 Kudos
Reply