Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

OpenMP and try-catch issue (only 32 bit Windows executable)

msolus
Beginner
530 Views
I believe I have encountered an issue using OpenMP with Intel C++ compiler version 12 update 5. The issue can only be reproduced when compiling a 32 bit exe on Windows with optimization (-O2) enabled.The code runs fine when compiling a 64 bit executable on Windows and Linux (I didn't try 32 bit Linux). The code runs fine when compiled in debug mode (-O0) both as a 32 and a 64 bit executable. My CPU is a mobile Westmere and I'm running Windows7 64 bit.

I have been able to reproduce the issue in this small example:
[cpp]#include 
#include 
#include 

#define USE_CRITICAL 0

int doSomething(int i, int j) {
	if (i % 4 == 0 && j % 2 == 0) {
		throw std::runtime_error("Error");
	}
	return i + j;
}

int main() {

	int itCount = 500;
	int total = 0;

	#pragma omp parallel
	{
		int myTotal = 0;
		#pragma omp for
		for (int i = 0; i < itCount; i++) {
			bool success = false;
			int j = 0;
			while (!success) {
				try {
					if (i >= itCount) {
						#if USE_CRITICAL
						#pragma omp critical
						#endif
						{
							std::cout << "Something wrong: thread " << omp_get_thread_num();
							std::cout << " running iteration " << i << std::endl;
						}
					}
					myTotal += doSomething(i, j);
					success = true;
					++j;
				} catch (std::exception &) {
					++j;
				}
			}
		}

		#pragma omp atomic
		total += myTotal;
	}

	std::cout << "Total = " << total << std::endl;

	return 0;
}[/cpp]

As you can see, exceptions are raised and caught within the worksharing construct, which should be fine according to OpenMP specifications.However, the code does not work as expected. In fact, I face twodifferent problemsdepending on the value of USE_CRITICAL. If USE_CRITICAL == 0, the threads get stuck in an endless loop. If USE_CRITICAL == 1, the process crashes.

Iattach the Visual Studio 2010 solution and the Linux makefile.

Regards,
Andrea
0 Kudos
10 Replies
Anoop_M_Intel
Employee
530 Views
Hi Andrea

I was able to reproduce this issue and we are currently working with the Development Engineers to get it checked out. Will let you know if I can come up with some workaround for this issue. Thanks for bringing it to our notice.

Anoop
0 Kudos
jimdempseyatthecove
Honored Contributor III
530 Views
There was an issue raised inanother forum where the preprocessor had a problem with a #pragma... followed with #endif

As an experiment try

[cpp]#if USE_CRITICAL   
if (i >= itCount) {   
  #pragma omp critical   
{
std::cout << "Something wrong: thread " << omp_get_thread_num(); std::cout << " running iteration " << i << std::endl;
} } #else if (i >= itCount) { std::cout << "Something wrong: thread " << omp_get_thread_num(); std::cout << " running iteration " << i << std::endl; } #endif [/cpp]

If this corrects the problem it will give Intel support a starting point for looking for the bug.

Jim Dempsey
0 Kudos
msolus
Beginner
530 Views

Dear Jim, your code does not correct the problem: still crashing if USE_CRITICAL == 1, stuck in an endless loop otherwise.

0 Kudos
msolus
Beginner
530 Views
I think it might be useful for debugging to point out that in this scenario all threads raise at least an exception and they all get stack in the FOR loop. In my original code only one thread raised an exception, and it was the only thead which kept running (the others were waiting at the implicit barrier).
0 Kudos
jimdempseyatthecove
Honored Contributor III
530 Views
Quoting msolus
I think it might be useful for debugging to point out that in this scenario all threads raise at least an exception and they all get stack in the FOR loop. In my original code only one thread raised an exception, and it was the only thead which kept running (the others were waiting at the implicit barrier).

Ahh...

This is indicative that the catch circumvented the barrier.

See if you can place the try/catch inside a scope that is not the scope of the statement of the #pragma omp parallel... You may need to insert something innocuous such the the compiler optimizations does not remove what appears to be dead code

#pragma omp parallel for
for(i=0; i .lt. n; ++i)
{ // parallel for scope
int dummy = 0;
{ // unnecessary scope
try {
...
} catch {
...
++dummy;
} // end catch
} // end unnecessary scope
if(dummy .lt. ) CanntHappen(); // unless you have .gt. 2g threads with errors
} // end for

You can fix the syntax. What you are experiencing appears to be a compiler error. The above may be a work around.

Jim Dempsey

0 Kudos
msolus
Beginner
530 Views
Dear Jim, thank for your suggestion. It looks like a compiler bug indeed. Unfortunately, in my application it was much harderto identify itassuch, resulting in several hours spentlooking fora bug in my own code (compilers are almost always right). I look forward to hear from Intel and I hope they can fix this bug soon.Iam quite curious about the nature of the issue.

To me, it looks like executing the catch block makes conditional jump of the for loop always not-taken. The issue is likely be a bit more complicated due to the fact that introducing the critical region makes the application crash. Maybe the stack gets corrupt while entering or exiting the catch block.

After figuring out it was a compiler issue, Iused a different workaround what you proposed, i.e. I just avoided using the worksharing construct and split the iteration count myself.This code works as expected,but is not suitable if you need dynamic scheduling.

[cpp]#pragma omp parallel
{
  int id = omp_get_thread_num();
  int threadIt = itCount / omp_get_num_threads();
  int extraIt = itCount % omp_get_num_threads();
  int myIt = threadIt + (id < extraIt ? 1 : 0);
  int first = id * threadItCont + std::min(id, extraIt);
  // NO WORKSHARING CONSTRUCT HERE
  for (int i = first; i < first + myIt; ++i) {
     ...
     while(!...) {
       try {
... } catch (std::exception &) { ... } } } }[/cpp]
0 Kudos
jimdempseyatthecove
Honored Contributor III
530 Views
I've done my own slicing so much that I've created a class:

[cpp]template
struct qtSlice
{
	T	iBegin;
	T	iEnd;
	T	jBegin;
	T	jEnd;
	T	kBegin;
	T	kEnd;

	qtSlice(T iThread, T nThreads, T _iBegin, T _iEnd)
	{
		T iStride = (_iEnd - _iBegin + nThreads - 1) / nThreads;
		if(iStride == 0) iStride = 1;
		iBegin = iStride * iThread + _iBegin;
		iEnd = iBegin + iStride;
		if(iEnd > _iEnd) iEnd = _iEnd;
		jBegin = jEnd = kBegin = kEnd = 0; 
	}

	qtSlice(T iThread, T nThreads, T _iBegin, T _iEnd, T _jBegin, T _jEnd)
	{
		if(nThreads > 2)
		{
			if(iThread < nThreads/2)
			{
				// take the first half
				qtSlice	slice(0, 2, _iBegin, _iEnd, _jBegin, _jEnd);
				qtSlice sliceOfSlice(iThread, nThreads/2, slice.iBegin, slice.iEnd, slice.jBegin, slice.jEnd);
				iBegin = sliceOfSlice.iBegin;
				iEnd = sliceOfSlice.iEnd;
				jBegin = sliceOfSlice.jBegin;
				jEnd = sliceOfSlice.jEnd;
				return;
			}
			// take the second half
			qtSlice	slice(1, 2, _iBegin, _iEnd, _jBegin, _jEnd);
			qtSlice sliceOfSlice(iThread - nThreads/2, nThreads - nThreads/2, slice.iBegin, slice.iEnd, slice.jBegin, slice.jEnd);
			iBegin = sliceOfSlice.iBegin;
			iEnd = sliceOfSlice.iEnd;
			jBegin = sliceOfSlice.jBegin;
			jEnd = sliceOfSlice.jEnd;
			return;
		} // if(nThreads > 2)

		iBegin = _iBegin;
		iEnd = _iEnd;
		jBegin = _jBegin;
		jEnd = _jEnd;
		if(nThreads == 1)
			return;

		T	ni = iEnd - iBegin;	//	number of i
		T	nj = jEnd - jBegin;	//	number of j
		T	aij = ni * nj;		// area of ij
		if(aij == 0)
			return; // empty area

		// try even split across one of the dimensions
		if(ni >= nj)
		{
			if((ni>=nThreads) && (ni%nThreads == 0))
			{
				T si = ni/nThreads;
				iBegin = iBegin + si * iThread;
				iEnd = iBegin + si;
				if(iEnd > _iEnd)
					iEnd = _iEnd;
				return;
			}
		}
		if((nj>=nThreads) && (nj%nThreads == 0))
		{
			T sj = nj/nThreads;
			jBegin = jBegin + sj * iThread;
			jEnd = jBegin + sj;
			if(jEnd > _jEnd)
				jEnd = _jEnd;
			return;
		}
		if(ni >= nj)
		{
			T si = (ni+nThreads-1)/nThreads;
			iBegin = iBegin + si * iThread;
			iEnd = iBegin + si;
			if(iEnd > _iEnd)
				iEnd = _iEnd;
			return;
		}
		T sj = (nj+nThreads-1)/nThreads;
		jBegin = jBegin + sj * iThread;
		jEnd = jBegin + sj;
		if(jEnd > _jEnd)
			jEnd = _jEnd;
		return;
	}
};
[/cpp]

...
qtSlice mySlice(omp_get_thread_num(), omp_get_num_threads(), iBegin, iEnd);
for(int i=mySlice.iBegin; i.lt.mySlice.iEnd; ++i)
{ ...



Jim Dempsey
0 Kudos
msolus
Beginner
530 Views
Dear Anoop, do you know if this bug will be fixed in the next update of C++ Composer? If so, when can we expect the next update to be released? Thank you in advance.

Andrea
0 Kudos
Anoop_M_Intel
Employee
530 Views
Hi Andrea

The Development Engineers are working on the fix currently.The next updatefor Intel C++ Composer will belaunchedtomorrow and so the fix for this bug won't make it in this update. I will surely keep you postedonwhen you can expect the fix. Thanks for the followup.

Thanks and Regards
Anoop
0 Kudos
Anoop_M_Intel
Employee
530 Views
Hi Andrea

The fix is provided in Intel C++ Compiler 12.1 Update 7 (latest version). You can download the same from Intel registration center (https://registrationcenter.intel.com/regcenter/register.aspx). Please let us know if you have any issues.

Thanks and Regards
Anoop
0 Kudos
Reply