Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1093 Discussions

SSE2 - Class crash with SSE related members

obscurity
Beginner
1,231 Views
Hello together,

i have a problem with SSE2 instructions. I am developing with MS VS2005 Express and I've done the following Project settings:

C/C++ > Code Generation:
Enable enhanced construction set: Streaming SIMD Extensions (/arch:SSE)
Struct Member Alignment : 16 Bytes (/Zp16)
C/C++ > Optimization:
Enable intrinsic functions: yes (/Oi)

I reduce the code examples to a minimum.
I have a class vec4 with the following member

class vec4
{
public:
vec4(void)
{
// as an example ...
reg = _mm_setzero_ps();
}

union
{
struct
{
float w,z,y,x;
};
__m128 reg;
float coord[4];
};
}

Now i have another class - let's say the classic foo, containing some of these vec4

class Foo
{
vec4 mPosition;
vec4 mOrientation;
vec4 mColor;
}

In my Application there is something like Foo * foo = new Foo(); where the instance is created,
but the application crashed because of an memory expection. I am quite confused, because i thought there can not be a problem if i've enabled the 16Bit allignment.


Has somebody an idea what habbend here?

0 Kudos
13 Replies
jimdempseyatthecove
Honored Contributor III
1,231 Views

Consider the following

#pragma pack(push)
#pragma pack(1)
__declspec( align(16) )
class vec4
{
...
}
#pragma pack(pop)

Also, consider replacing (16) with a symbol name or expression using sizeof(...). Later revisions of SSE will use 4 element vectors of double.

The definition of class Foo may also need to be similarly specified depending on your requirements.

Also, depending on your archetecture "w,z,y,x" may need to be "x,y,z,w" so conditional compilation may need to be around that as well as creating an enum or #define for the individual element indexes. (eliminates confusion as towhat coord[2] represents)

Jim Dempsey

0 Kudos
levicki
Valued Contributor I
1,231 Views

Hi Jim, that is the bug I have reported if you remember. __declspec(align(16)) doesn't do anything for objects allocated via new[]. :-)

obscurity, it is a bug in the compiler which causes new[] to return unaligned memory pointer. Intel Compiler 11.0.026 beta fixes the problem.

0 Kudos
gabest
Beginner
1,231 Views
obscurity:
Now i have another class - let's say the classic foo, containing some of these vec4
class Foo
{
vec4 mPosition;
vec4 mOrientation;
vec4 mColor;
}
These vec4 class members can't be aligned, don't put them there, __declspec(align()) has no effect on them. Just think about it, the data of the class doesn't necessarily start from the first line, it may have a vtable and inherited members.
0 Kudos
levicki
Valued Contributor I
1,231 Views

Hello Gabest, nice to see you here!

Try compiling and running the attached code sample using MSVC and ICC to see what the real problem is.

0 Kudos
gabest
Beginner
1,231 Views

MSVC aligns vec inside the class, but can't do anything on the global scope if you don't overload new/delete. I don't think that can fixed by the compiler, the runtime's new just calls malloc, I would not callthis behavioura bug. If that IC beta fixes it, it could be that they changed malloc to align on 16 bytes now, but what happens when avx comes along.

0 Kudos
TimP
Honored Contributor III
1,231 Views
AVX memory operations are still 128-bit, at least in the initial version. Aside from a possible greater visibility of performance effects of page splits, there may not be much need for stricter alignment.
As the AVX instructions won't be supported under current OS releases, there may be a possibility of adjustments beyond those already mentioned (save/restore provisions for the new register segments).
0 Kudos
levicki
Valued Contributor I
1,231 Views

gabest, if you run that code sample in a debugger you will see that the misalignment for new comes from the Intel compiler storing the object count for later use by delete at the beginning of the allocated memory, and then increments (and thus misaligns) the pointer which you get back. It is a confirmed issue (I know because I reported it) and it is targeted to be fixed in the 11.0 release. My testing with 11.0.026 beta confirms it has indeed been fixed much to my satisfaction.

As for the "class inside of class" problem — try changing the example code I attached to my previous post like this:


class Foo
{
	MyClass	mPosition;
	MyClass	mOrientation;
	MyClass	mColor;
};

int main() 
{
	Foo	bar;
}

If I compile it with 11.0.026 beta this is what I get when I run it:


c:>test

In constructor for 0012FE80
In constructor for 0012FE90
In constructor for 0012FEA0
In destructor for 0012FEA0
In destructor for 0012FE90
In destructor for 0012FE80

No crashing, everything is aligned even without __declspec(align()). Even if I change the above like this:


class Foo
{
	MyClass	mPosition;
	int	mCrap;
	MyClass	mOrientation;
	MyClass	mColor;
};

I get this:


c:>test

In constructor for 0012FE80
In constructor for 0012FEA0
In constructor for 0012FEB0
In destructor for 0012FEB0
In destructor for 0012FEA0
In destructor for 0012FE80

So, your assumption about vtable and inheritance seems to be incorrect. Those low-level implementation details (which by the way vary between different compilers) thankfully aren't exposed to the developers, so there is no reason why class alignment shouldn't just work.

0 Kudos
gabest
Beginner
1,231 Views

Object count at the beginning doesn't have anything to do with alignment, MSVC just happily wastes 16 bytes in front of the array to store a DWORD there.But that's an implementation detail I'm afraid. The beginning of the whole thing, including the leading object count and the allocated space for the array, can be returned misaligned, and it usually does return it ending on 8. But don't know how we got to arrays, just one dynamically allocated instance of a class may not be aligned, and in that casethere is no leading object count to "contribute" to a compiler bug. How could that be fixed then?

tim18: I just assumed 256 bit loads will require 32 byte aligned data. Isn't that right? If MSVC still uses 16 byte fillers thenthis will render the operator overloading solution useless too, for arrays. (just verified it, added __declspec(align(32)) __m128 vecand __mm_malloc(size, 32), and even though it will position vec to the right place inside the class, that 16 byte filler will shift it to a 16 byte aligned position... so this is no good beyond 16)

0 Kudos
levicki
Valued Contributor I
1,231 Views

gabest:
Object count at the beginning doesn't have anything to do with alignment, MSVC just happily wastes 16 bytes in front of the array to store a DWORD there. But that is an implementation detail I'm afraid.

Let me see if what you just said makes more sense with some added emphasis. As I said the implementation details should be hidden.

gabest:
But don't know how we got to arrays, just one dynamically allocated instance of a class may not be aligned, and in that case there is no leading object count to "contribute" to a compiler bug. How could that be fixed then?

I agree that there are two separate problems. However, it is not very likely that the original poster will only use new to allocate single object as in his minimalistic example — sooner or later he would hit the array problem as well. That is why I mentioned it.

As for the solution to the single dynamically allocated instance problem — you can override new and delete (just like I did for new[] and delete[] in the attached code). Now lets talk about more permanent solution to this whole alignment problem.

In my opinion language should evolve with hardware — __m64 and __m128 types exist for so long that they should be considered built-in types by now. They are going to stay with us for a long time, not to mention we will most likely get __m256 in the near future as well.

So, if we want software to work without developers having to jump through hoops, the compiler should take care of proper alignment of those "new" types just like it does for double, int, short, etc. In other words, we shouldn't have to litter our code with __declspec(align()) and to override new and delete operators to make it work.

0 Kudos
obscurity
Beginner
1,231 Views
Hey,

I am very happy about your discussion.
First at all - i'm happy that it is not my fault :-)

to Igor:
Yes, you are right. As you can see in my Example, i am using vec4 as my default class for 3D(4D) vector operations.
I knew about the array Problem, but it seems to me something stupid that there is something like SSE which should accelerate such operations an I got those problems. I won't use new and delete the whole time, but with this informations
i can use it to accelerate not such frequently used (but time consuming) classes.

In your code example (from 12.07.) your just declared MyClass with declspec and not using a struct?
Or have you done an override of new/delete? I understand that with declspec declared new and delete, I should get an aligned Instance. What i don't understand is that the mem seems to be aligned in that example. In my opinion is more a coincidence. What happens if you declare your Variable mCrap as "char"? Or if you decalre a char in front of "bar" in main?

Okay, i think i misunderstood something, but i am very interessted. Maybe you may enlight me. I'll make some tests to get some better ideas :-)

Thanks for the discussion - hope i can answer these day more frequently
0 Kudos
levicki
Valued Contributor I
1,231 Views

obscurity, my code example overrides new[] and delete[] operators.

If you declare mCrap as a char you will still get the proper alignment as long as you use -Zp16 switch because mCrap is a part of the structure (class is essentially a structure).

If you declara char mCrap before MyClass Bar in main() then you will need __declspec(align(16)) in front of MyClass Bar because that char is not part of any structure and it is not affected by -Zp16 switch.

If you will always allocate your class instances dynamically, the best you can do considering the current state of C++ compilers and runtime libraries is to add the following code to your vector class:


	void *operator new (size_t size)
	{
		return _mm_malloc(size, 16);
	}

	void *operator new[] (size_t size)
	{
		return _mm_malloc(size, 16);
	}

	void operator delete (void *mem)
	{
		_mm_free(mem);
	}

	void operator delete[] (void *mem)
	{
		_mm_free(mem);
	}

That way you will have aligned memory allocations both for single instances and class arrays. Unfortunately, for that to work you need to have 11.0.026 beta version of the compiler because in older versions new[] and delete[] override does not work properly due to the compiler bug I mentioned.

0 Kudos
gabest
Beginner
1,231 Views

Igor: Could you check me how the memory layout of object count /vtable /memberslook like compiled with 11.0.026 and earlier? Can it also handle>16 but power of twoalignment?

update:

Went ahead and tested it on 10.1.022. As I thought the objectcount uses 4 bytes, shifting the beginning of the class to a 4 byte offset, that is certainly not good and will cause a crash. Since MSVC always pads it to 16 bytes, there is no problem, as long as you don't want 32 byte alignment. This is that little "implementation detail" that makesassumptions about class members alignment unpredictible, across different compilers. I still wonderwhat .26 does different to get this right, and what it does to single class allocations.

Btw, it was already worth cheking this topic, I had one class where I needed aligned members but allocated a sub struct, and accessed it through a second pointer (1st being "this") in a performance critical function. Overloading new worked, but this is a single instance classonly:)

0 Kudos
levicki
Valued Contributor I
1,231 Views

gabest, sorry for not replying earlier but I was busy.

As far as I know, 11.0.026 beta and newer also perform 16-byte alignment after storing the object count just like you said MSVC does.

For single class allocations you still have to overload new and delete. For objects allocated on stack you still have to use __declspec(align(16)). Hope that clears it up.

0 Kudos
Reply