Application Crash

prestos · ‎03-06-2009

hi,

I just installed Intel C++ Compiler 11.0.072 integrated on MS Visual Studio 2005, on Windows XP SP2 PC.

My application is generated through an MSVC 2005 solution that contains 3 projects (the main and 2 others used as static libraries that the main project depends on). Further on, there are some external libraries that have been earlier compiled using MSVC 2005 compiler that the solution uses.

From all three solution projects, I use Intel Compiler only on the sensitive one (a static library where various optimizations need to be applied such as /O3, loop unrolling, intrinsics, etc.) but the other two projects are being compiled using MSVC 2005 (for the purpose of having faster compilation and smaller code for the part where speed does not matter so much)*.

So, after I compile the application in Release mode and launch it, I get an immediate crash. Checking the dissasembly, reveals that the crash-line is a MOVAPS instruction, with an unaligned memory address (ie. not 16-byte aligned).

In Debug mode, things go even more weird. The application does not crash (there's no loop unrolling now, but I am not sure whether SIMD instructions are involved), but the application does crash when exiting (at the exit procedure, there are exceptions raised in the destruction of static objects and more specifically in delete[] operator).

Is there something that I can do to avoid this crash? Do I need to compile the complete solution using Intel compiler (something I 'd like to avoid though - this is not consistent solution anyway when dealing with linking against external libraries).

thanks in advance,

prestos

PS. My test PC has a Dual Core (SSE3) Intel CPU and the target architecture for the compiler is set to default (that is, to run it on IA32 with SSE2 instruction set capabilities).

PS2. The opinions and remarks found in this post are completely personal and subjective.*

JenniferJ · ‎03-06-2009

It seems there'retwo issues here:
1. In Rel,try__declspec(align(16)) to see if it works around the problem.
2. In debug, use the binary-search method to isolate to which file (built with icl) that causes the problem.

can you get a small testcase for the above?

prestos · ‎03-06-2009

Quoting - Jennifer Jiang (Intel)

1. In Rel, try __declspec(align(16)) to see if it works around the problem.

You mean to switch to 16-byte alignment for all projects or only for the library project compiled by icc? Perhaps it can solve it, but it should be last option for me (since the application may have large memory footprint - 2Gb or more - and such an unconditional change could double the mem requirements). Normally, the compiler should automatically decide whether it can use or not aligned SIMD code or force alignment on the target structs, right?

2. In debug, use the binary-search method to isolate to which file (built with icl) that causes the problem.

I know which file it is, but when I "hacked" it, the problem appeared in another one. So, it seems to me that this might be a more "generic" issue. Since these are static objects (with some dynamic memory allocation), could it be that there are different new/delete handlers called?

can you get a small testcase for the above?

That's rather difficult. The project is so big, and I am not sure if this can appear in a solution built with similar logic from scratch.

One thing I know, is that when I was testing ICC 10.1 (and earlier versions), I 've never had these problems... But the difference was that: a) I was using MSVC 6.0 with this 10.1 ICC version and b) I was just compiling all projects in the workspace using ICC (but any external libraries were still being compiled with MSVC6).

thanks

prestos

Olivier · ‎03-06-2009

Quoting - prestos

So, after I compile the application in Release mode and launch it, I get an immediate crash. Checking the dissasembly, reveals that the crash-line is a MOVAPS instruction, with an unaligned memory address (ie. not 16-byte aligned).

Can you see if the memory address is static storage or something you got from new?

prestos · ‎03-06-2009

Quoting - Olivier

Can you see if the memory address is static storage or something you got from new?

Is there a way to understand this from just dissasembly? Even if I knew, how could this help? I'd like to find the exact struct that is unaligned to avoid making a global 16-byte alignment.

Olivier · ‎03-06-2009

Quoting - prestos

Is there a way to understand this from just dissasembly? Even if I knew, how could this help? I'd like to find the exact struct that is unaligned to avoid making a global 16-byte alignment.

Not easily, I think. You can print a few addresses which you know to be static and a few which you know to be on the heap and then see which yours is closest to.

As you mentionned errors in destructing static objects in debug, I thought your release problem might also be in static storage. Knowing this could narrow the possibilities a lot (or not, I haven't seen your code).

If you can't get any kind of call stack from the debugger to help you figure out which function is crashing, it's probably time to start putting traces all over the place :( Or do you already know which function is crashing? If so, can't you figure out where the data comes from?

jimdempseyatthecove · ‎03-06-2009

Quoting - prestos

Is there a way to understand this from just dissasembly? Even if I knew, how could this help? I'd like to find the exact struct that is unaligned to avoid making a global 16-byte alignment.

Compile your release code with debug symbols. Keep your optimizations as they normally are for release mode, just add the option to keep the debug symbols. Do this for both the compiler and linker (you do not want the linker to strip out the debug symbols of the compiled code).

Then run your program until it fails (run Release code from debugger using F5, OK any warning).

At failure point you usually will get the address of both the instruction that caused the failure and the location it was attempting to accesses. If you do not get the location it was attempting to access, that is ok, because you have the address of the instruction that failed.

If you did not break into the debugger on the error, rerun with debugger but this time F11 to step into your application. Now in context of the aplication (actually the CRT startup code) you can open a dissassembly window, and from there you can perform a goto (Ctrl-G) and enter the hex address of the location for the bad instructioj. Peek around there for the movaps or whatever it was and then place the break point on that instruction. Then F5 to continue to that point.

With luck, when you reach that point you can examine the call stack, and double click on the top item. This should take you to the line in your source. If not then the source with the error was not compiled with debug info.
You can also double click on the next to top of the call stack, and so on to help trace the problem

Also, at the location of the instruction in error, you can look at how the effective adderess is calculated and then using the registers window compute the address by hand. You may note that one of the components of the addresss is bad. That information may be helpful in tracking down the error.

Jim Dempsey

prestos · ‎03-07-2009

Quoting - jimdempseyatthecove

Compile your release code with debug symbols. Keep your optimizations as they normally are for release mode, just add the option to keep the debug symbols. Do this for both the compiler and linker (you do not want the linker to strip out the debug symbols of the compiled code).

That's sounds promising. :)

Also, at the location of the instruction in error, you can look at how the effective adderess is calculated and then using the registers window compute the address by hand. You may note that one of the components of the addresss is bad. That information may be helpful in tracking down the error.

This is known from dissasembly (if i understand you correctly). It's something like:

xor eax,eax

movaps xmm0,[base+4*eax]

since eax is zero, the base address is the one that creates the crash (i.e. not 16-byte aligned).

I will try your recommendation and come back (will do so shortly but not exactly now, since i have to hurry for a milestone)

thank you

prestos

levicki · ‎03-07-2009

First and foremost, you haven't specified whether you use C++ features such as new (especially new[]) for memory allocation or plain old-school malloc().

Second, you haven't specified whether you have forced aligned vectorization anywhere in your code by using #pragma vector aligned.

Third, you haven't performed nearly enough troubleshooting on your own to isolate the error location. Steps which you could have taken include:

- Compiling with various compiler optimization options disabled to see whether the problem disappears.
- Generating Assembly with Source Code (/FAs) so you can locate which source file has the offending instruction.
- Running the release code compiled with debug info under debugger to intercept the crash and examine surrounding code and variables.

To summarize -- compiler knows that it should not use aligned access for unaligned memory. The only two ways that I know of to make it generate broken code are as follows:

[cpp]#1:

__declspec(align(16)) float a[1024];
__declspec(align(16)) float b[1024];

#pragma vector aligned		 // this will crash because
for (int i = 2; i < 1024; i++) { // non-zero loop start causes
				 // mis-alignment, and you used
				 // pragma to tell the compiler
				 // that the memory is aligned
}
[/cpp]

[cpp]#2:

#include 
#include 

class	Test {
public:
	Test()
	{
		printf("this = %pn", this);
		a = _mm_setzero_ps(); // unaligned access
	}

	__m128	a;
};

int main(void)
{
	Test *a = new Test[5]; // crash in constructor
	delete [] a;
	return 0;
}
[/cpp]

First case is solvable by removing #pragma vector aligned, second by using placement new and delete:

[cpp]	void *operator new[] (size_t size)
	{
		return _mm_malloc(size, 16);
	}

	void operator delete[] (void *mem)
	{
		_mm_free(mem);
	}
[/cpp]

Moreover, using malloc() or new[] for large allocations (you mentioned near 2GB of RAM memory usage) is not recommended -- it is better to use OS memory allocation API (VirtualAlloc/VirtualFree in Windows) which returns page-aligned memory.

Finally, unless you can narrow it down and show us a reproducible test case I doubt anyone here can help you any further. By the looks of the assembler code you have shown above your buffer isn't aligned. If your code doesn't fit under any of the above two cases then there might be a code generation bug.

prestos · ‎03-09-2009

Quoting - Igor Levicki

First and foremost, you haven't specified whether you use C++ features such as new (especially new[]) for memory allocation or plain old-school malloc().

Both of them exist with the vast majority being the C++ way (operator new). As far as the allocation/deallocation is paired up correctly (malloc/free, new/delete) there's no problem with mixing them, so this is rather irrelevant.

Second, you haven't specified whether you have forced aligned vectorization anywhere in your code by using #pragma vector aligned.

There's no #pragma definition regarding vector alignment, anywhere in the source.

Third, you haven't performed nearly enough troubleshooting on your own to isolate the error location. Steps which you could have taken include:

- Compiling with various compiler optimization options disabled to see whether the problem disappears.
- Generating Assembly with Source Code (/FAs) so you can locate which source file has the offending instruction.
- Running the release code compiled with debug info under debugger to intercept the crash and examine surrounding code and variables.

As said in previous post, I know the source files for the debug crash case and this still doesn't give me any hint about the actual source of the problem. Having made the compilation in debug mode, all optimizations ARE disabled. Regarding Intel compiler, I haven't made any explicit change to the default settings other than using /O3 optimization and inline/intrinsic usage whenever possible (in release).

The suggestion that makes sense to me at this moment, is to try having debug information in the release compilation, to find the source that generates the "unaligned" problem. But given that I haven't set anything special about alignment, I am affraid that this won't guide me to a solution, what so ever (at least, I want a solution that is as "tight" as possible - avoiding for example making a global 16-byte struct alignment, or using ICC for compiling everything).

To summarize -- compiler knows that it should not use aligned access for unaligned memory. The only two ways that I know of to make it generate broken code are as follows:

[cpp]#1:

__declspec(align(16)) float a[1024];
__declspec(align(16)) float b[1024];

#pragma vector aligned		 // this will crash because
for (int i = 2; i < 1024; i++) { // non-zero loop start causes
				 // mis-alignment, and you used
				 // pragma to tell the compiler
				 // that the memory is aligned
}
[/cpp]

[cpp]#2:

#include 
#include 

class	Test {
public:
	Test()
	{
		printf("this = %pn", this);
		a = _mm_setzero_ps(); // unaligned access
	}

	__m128	a;
};

int main(void)
{
	Test *a = new Test[5]; // crash in constructor
	delete [] a;
	return 0;
}
[/cpp]

First case is solvable by removing #pragma vector aligned, second by using placement new and delete:

[cpp]	void *operator new[] (size_t size)
	{
		return _mm_malloc(size, 16);
	}

	void operator delete[] (void *mem)
	{
		_mm_free(mem);
	}

[/cpp]

As said before, I don't make any explicit usage of #pragmas and special alignment instructions to the compiler.

Moreover, using malloc() or new[] for large allocations (you mentioned near 2GB of RAM memory usage) is not recommended -- it is better to use OS memory allocation API (VirtualAlloc/VirtualFree in Windows) which returns page-aligned memory.

Well, the memory is not taken with just a couple of new/malloc calls. This memory is acquired piece by piece (some may point to using some memory arena way of dealing with this, but this is another complex topic on its own - the size of memory taken is not known a-priori anyway). Furthermore, I 'd like to avoid OS specific calls.

Finally, unless you can narrow it down and show us a reproducible test case I doubt anyone here can help you any further. By the looks of the assembler code you have shown above your buffer isn't aligned. If your code doesn't fit under any of the above two cases then there might be a code generation bug.

Well, I posted for ideas and suggestions not for someone actually debugging for me.

thanks,

prestos

levicki · ‎03-09-2009

prestos,

You completely missed my point.

I wasn't implying that you are mixing malloc/free and new/delete incorrectly -- I was saying that neither new nor malloc() provide aligned memory allocations, so if you are doing dynamic allocation for arrays of structures or arrays of C++ objects you should consider using memory allocation functions which allow you to specify the required alignment such as _mm_malloc() or placement new override as I have shown in the example code.

You still haven't said whether you use any SIMD instructions explicitly (by using SIMD classes or intrinsics or inline assembler).

As for debugging and optimizations -- everyone here suggested compiling in Release mode with debug information, and not in Debug mode.

Furthermore, I told you that you can also use /FAs to identify the crash location given that you know the source file and the instruction that causes the crash.

As for the "solution that is as tight as possible" -- first, if all code isn't produced by ICC, then it might be some interaction between the two binaries (for example ICC expecting an aligned pointer). Try compiling everything with ICC and see if it fixes the problem. Second, try adding /sfalign and /Zp16 switches to see if that resolves the problem (but I doubt it since I believe it has to do with some of those dynamic allocations). Third, if you cannot make structures properly aligned while keeping the tight packing you should consider using structures of arrays instead of arrays of structures.

As for the memory, I wasn't implying that you are allocating a huge chunk at once. If you do have large allocations OS API is the best way to go, if you are doing a lot of small allocations/deallocations then you might consider writing your own memory manager because sooner or later you will hit the issue of virtual address space fragmentation. That is the sole reason behind Adobe Photoshop having their own VM implementation.

As for the ideas .vs. debugging -- you got plenty of ideas from everyone, as much as it was possible without actually seeing the code, but you still have to narrow the problem down to a reproducible test case in order to prove that it is a compiler issue. With that said, let me be the first to encourage you to submit an issue with Intel Premier Support if it turns out to be a compiler bug.

Good luck!

prestos · ‎03-09-2009

Quoting - Igor Levicki

prestos,

You completely missed my point.

I wasn't implying that you are mixing malloc/free and new/delete incorrectly -- I was saying that neither new nor malloc() provide aligned memory allocations, so if you are doing dynamic allocation for arrays of structures or arrays of C++ objects you should consider using memory allocation functions which allow you to specify the required alignment such as _mm_malloc() or placement new override as I have shown in the example code.

You still haven't said whether you use any SIMD instructions explicitly (by using SIMD classes or intrinsics or inline assembler).

Sorry, I misunderstood, but if I had any aligned memory allocation that would be the first thing to look. At least, I have verified that all my SIMD code is commented out, and as a proof, it compiles under VC6 (which does not have any intrinsic or aligned mem allocation functionality).

As for debugging and optimizations -- everyone here suggested compiling in Release mode with debug information, and not in Debug mode.

Furthermore, I told you that you can also use /FAs to identify the crash location given that you know the source file and the instruction that causes the crash.

Yes, these are 2 good points. (i wasn't aware of the /FAs switch - thanks).

As for the "solution that is as tight as possible" -- first, if all code isn't produced by ICC, then it might be some interaction between the two binaries (for example ICC expecting an aligned pointer).

Do you have any reference for this "ICC expecting an aligned pointer"? If and when it takes place? Or is it just speculation? It really sounds "great expectation" to me...

With that said, let me be the first to encourage you to submit an issue with Intel Premier Support if it turns out to be a compiler bug.

That's a long way till that point :) - i am quite sure there is some explanation. Thanks for your help,

prestos

levicki · ‎03-10-2009

As for ICC "expecting an aligned pointer" -- it is a speculation, but not completely improbable so in my opinion it warrants some testing.

For example, if you compiled a library or object file you are linking to with different structure packing or stack alignment using another compiler and you compile the rest of the code with different packing/alignment using ICC.

I really can't see a good reason to keep part of the code compiled with VC6 (or whatever you have used) -- in my opinion mixing the output of two compilers is just asking for trouble.

If you really want to optimize some parts for size instead of speed you would be better off setting different compile switches for that particular source file, and using ICC for the whole project.

Furthermore, you haven't checked whether ICC is vectorizing any of your loops. Set /Qvec_report switch and check if the offending source file has any vectorized loops, then compare the offending code by looking at the listing generated with /FAs.

Movaps you are seeing (and you said you don't have any SIMD code of your own) can also be a part of some compiler intrinsic code so you can also try to set /Oi- and see if that fixes the problem.

As for compiling Release with debug info, I see you haven't gotten around to it yet so I will presume you are not sure how to do it:

Set the configuration to Release, then right-click on a project and select Properties
Under Configuration Properties|General|Whole Program Optimization select No Whole Program Optimization
Under Configuration Properties|C/C++|General|Debug Information Format select Program Database for Edit & Continue(/ZI)
Under Configuration Properties|C/C++|Code Generation|Enable Function-Level Linking select Yes
Under Configuration Properties|Linker|Debugging|Generate Debug Info select Yes (/DEBUG)

Then you should be able to build and run the program (by pressing F5), and when it crashes it will end up in debugger so you will be able to see a stack trace and examine variables. If the program requires command line arguments and/or specific working directory don't forget to set those under project's Configuration Properties|Debugging.

Good luck!