Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB QA fails on Red Hat Enterprise Linux 6.3 and SusE 11

dulantha_f
Beginner
764 Views

I built the latest (4.2 update 2) source code on RHE 6.3 and SusE 11 64bit. On both machines the code builds fine but the QA fails with the following errors and returns to the command line.

src/test/test_eh_tasks.cpp:222, assertion g_ExceptionCaught: no exception occured

make[1]: *** {test_tbb_plain] Aborted

rm test_assembly_compiler_builtins.o test_atomic_compiler_buildints.o

make[1]: Leaving directory [...]/linux_intel64_gcc_cc4.3_libc2.11.3_kernel3.0.13_release

make: [test] Error 2 (ignored)

Has anyone else ran in to this problem?

To build and run, I run 'make' and then 'make test'

0 Kudos
17 Replies
RafSchietekat
Valued Contributor III
764 Views

That doesn't seem to be from tbb42_20131118oss (4.2 update 2)?

 

0 Kudos
dulantha_f
Beginner
764 Views

Oh yeah, I pasted from the wrong run. What I did was when 4.2 Update 2 didn't work, I went down to 4.2 to see if that'll work.

0 Kudos
Anton_M_Intel
Employee
764 Views

I cannot reproduce it. I have used SUSE Linux Enterprise Server 11 (x86_64) with gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] (but different kernel version):
make[1]: Leaving directory `[...]/linux_intel64_gcc_cc4.3_libc2.11.3_kernel3.0.76_release'

I checked both TBB 4.2 and U2. They work fine.

As a workaround, if you don't care about the exceptions support, you can disable this feature (or just its testing) by specifying CXXFLAGS=-DTBB_USE_EXCEPTIONS=0

 

0 Kudos
dulantha_f
Beginner
764 Views

I'm curious, how long does it take you to run the whole QA? For me the test_atomic seems to take a few hours itself - at least 3 hours. I'm running on a network machine with an Intel Xeon X5690 @3.47GHz with 6 cores. RAM is 4GB. Few hours for a single test seems a little excessive. 

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

(Replaced incrementally with the following, now frozen.)

When I tested this on an otherwise mostly idle non-x86 server system, test_atomic.exe in debug was under 30 seconds for default_num_threads() set to either 8 or 16 (using voodoo magic). Your mileage may vary, but a Xeon-based system shouldn't be fully 360 times slower; my 4-core 8-thread Ivy Bridge-based laptop computer runs the test in less than 10 seconds, i.e., fully 3 times faster than the same reference.

test_task_priority.exe in debug on the above-mentioned server system with default_num_threads() set to 8: 35 seconds. With default_num_threads() set to 16 (also the built-in limit for that test): I gave up after 15 minutes. Does this really run reliably on x86 with 16 hardware threads?

On the above-mentioned laptop computer, the whole test suite ("make run_cmd=echo test; time make test") took 7.5 minutes, but that was with a failure in test_eh_algorithms.exe debug: "../../src/test/test_eh_algorithms.cpp:779, assertion g_CurExecuted <= minExecuted + g_NumThreads: Too many tasks survived exception". A second attempt suffered no failures and completed in less than 11 minutes. What's the explanation for this failure?

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

Curiously, test_task_priority.exe took about 35 seconds with default_num_threads() set to 8 (see above), but when the following line is changed at src/test/test_task_priority.cpp: "Harness::LimitNumberOfThreads( 8 );" (originally 16) the time becomes about 1.5 minutes (each test was run 3 times). With default_num_threads() set to 16, the times are respectively unbounded (see above) and again about 1.5 minutes (each test was run once).

(Added) Note that this server has more than 16 hardware threads, so I did not (knowingly) oversubscribe it.

(2014-02-13 Added) I've discovered that this probably only happened because of something I changed in the code, so you may safely ignore this posting.

0 Kudos
Anton_M_Intel
Employee
764 Views

Raf Schietekat wrote:

test_task_priority.exe in debug on the above-mentioned server system with default_num_threads() set to 8: 35 seconds. With default_num_threads() set to 16 (also the built-in limit for that test): I gave up after 15 minutes. Does this really run reliably on x86 with 16 hardware threads?

Raf, it's a pity that we still see test_task_priority failing on assert occasionally. We did not manage to find the root cause yet (and have no much time for it since there is no indication of a real correctness or performance problem from a customer). Otherwise, it almost fits the time limits for our nightly tests (2-3 minutes).

Raf Schietekat wrote:

n the above-mentioned laptop computer, the whole test suite ("make run_cmd=echo test; time make test") took 7.5 minutes, but that was with a failure in test_eh_algorithms.exe debug: "../../src/test/test_eh_algorithms.cpp:779, assertion g_CurExecuted <= minExecuted + g_NumThreads: Too many tasks survived exception". A second attempt suffered no failures and completed in less than 11 minutes. What's the explanation for this failure?

There was too tight constrains or some problem in the test itself, U3 will fix most of long runs and such assertion failures

0 Kudos
dulantha_f
Beginner
764 Views

I just found out that the machine I'm running the tests on are virtual machines (work - they just tell us the machine name). So that probably is the reason why it takes forever to run the test suite. When I ran it on my laptop (i7 with 8 cores) the times were consistent with Raf's numbers

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

For test_task_priority.exe I don't see any assert failures, just a few threads apparently not participating and the program not finishing, or otherwise complaining about "Known issue: priority effect is limited in case of blocking-style nesting" and "Warning: test 3 misbehaved too often (12 out of 12)" a few times per execution. Would you consider lowering that limit from 16 (never finished yet) to 8 (seems to reliably complain but finish)?

OK about test_eh_algorithms.exe, I'll just ignore that then.

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

dulantha_f wrote:

When I ran it on my laptop (i7 with 8 cores) the times were consistent with Raf's numbers

You mean 8 hardware threads (4 cores, with hyperthreading)…

0 Kudos
dulantha_f
Beginner
764 Views

Raf Schietekat wrote:

You mean 8 hardware threads (4 cores, with hyperthreading)…

Yup

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

(Removed question that became irrelevant because of new information.)

(Added) dulantha_f, great that the mystery of the original question has been solved, BTW, and thanks for letting us know.

0 Kudos
Vladimir_P_1234567890
764 Views

thanks for letting us know that problem is addressed

--Vladimir

 

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

Actually it is interesting that those virtual machines should be that much slower. Should that really be expected? TBB schedules tasks in user space, which is supposed to run near native speed, so I would expect heavy thread scheduling to be slowed down but not TBB. Does anybody have additional information, or an explanation?

0 Kudos
Vladimir_P_1234567890
764 Views

well i do not have details for this particular case but this is possible in case VT is either disabled or not used by VM server. And atomics test is the best candidate to 1000x slowdown on such machine.

0 Kudos
RafSchietekat
Valued Contributor III
764 Views

Vladimir Polin (Intel) wrote:

And atomics test is the best candidate to 1000x slowdown on such machine.

I'll have to take your word for it, because I don't see why that should be the case.

0 Kudos
Vladimir_P_1234567890
764 Views

since the test tests atomics this is high memory-bound test and this is one of the slowest tests because of its high contention. 

IMHO this is not a problem on VT enabled virtual machine but in case VM needs to emulate atomic memory access and multicore behaviour this should cause slowdown.

--Vladimir

0 Kudos
Reply