- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My computer summary: AMD Athlon XP 2000+/1G/WinXP Sp3
Compiler is MinGW g++4.40.
The TBB can be built, but when I try to run the example(for instance, prallel_reduce/primes), the error message box is appeared: "unknown software exception (0xc000001e)".
I tried tbb22_004oss, tbb22_013oss and tbb30_20100314oss, And get same error.
Fortunatly, I copy these built executeable file to other computer(Intel Core2 CPU), It can run correctly.
Have any idea to resolve this? Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No you should not remove inlined assembly and make the method a noop. Inlined assembly also serves the purpose of a compiler fence, i.e. it prevents an optimizing compiler from reordering instructions around the call.
For the purpose of memory fence, any lock-prefixed operation also succeeds. Indeed xchg is the only operation that implies a fence without specifying the lock prefix; but its disadvantage is that it requires at least one register. I guess it's the job of the compiler to save the value of the register and restore it afterwards, but also it can be avoided altogether with an operation applied to memory and immediate value. E.g. I think the following should work:
inline void __TBB_rel_acq_fence() { int tmp; __asm { __asm lock add tmp,1 } }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for reporting the issue.
Would it be possible for you to rebuild your application with another compiler, preferrably Visual C++ or Intel C++ Compiler? I'd like to understand whether you hit some subtle difference in HW behavior TBB inadvertently relies upon (and so it may exist in pre-built TBB binaries), or it is the issue specific to MinGW.
Another idea is to check whether we accidentally pass some option to MinGW that causes it issuing instructions incompatible with your CPU. Lookinng at build/windows.gcc.inc, I only found -msse option suspicious; would you mind to remove it and try again?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried rebuild the tbb libaray and example with VC2005 Express, but get same error(memory position is different).
I also tried to remove -msse option from windows.gcc.inc and rebuild all with MinGW again, and do not have surprise yet.
btw, I also built the tbb under linux system with gcc, the result is here(close to windows result):
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our internal testing on an Opteron machine did not reveal any issue, so we need your further help. The problem might be specific to your system settings. Could you please run a couple more experiments?
First, could you please take pre-built binaries of TBB (from the Windows-specific package supplied with a com-aligned or a stable release) and see if those work?
Second, do you have a stack trace of the fault (the one from VC is preferable, but any other would probably work as well)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I built example/primes(rev tbb22_013oss) with VC9, below is stack trace info:
===========================================================
> tbb_debug.dll!__TBB_rel_acq_fence() line 43 + 0x3 byte C++
tbb_debug.dll!tbb::internal::GenericScheduler::get_task() line 2570 C++
tbb_debug.dll!
tbb::internal::CustomScheduler<:INTERNAL::INTELSCHEDULERTRAITS>::local_wait_for_all
(tbb::task & parent={...}, tbb::task * child=0x7ff87aa0) line 2945 + 0x8 byte C++
tbb_debug.dll!tbb::internal::GenericScheduler::local_spawn_root_and_wait(tbb::task &
first={...}, tbb::task * & next=0x00000000) line 2521 C++
tbb_debug.dll!tbb::internal::GenericScheduler::spawn_root_and_wait(tbb::task &
first={...}, tbb::task * & next=0x00000000) line 1462 C++
primes.exe!tbb::task::spawn_root_and_wait(tbb::task & root={...}) line 581 C++
primes.exe!tbb::internal::start_reduce
const >::run(const SieveRange & range={...}, Sieve & body={...}, const
tbb::simple_partitioner & partitioner={...}) line 144 + 0x85 byte C++
primes.exe!tbb::parallel_reduce
Sieve & body={...}, const tbb::simple_partitioner & partitioner={...}) line 262 + 0x11 byte
C++
primes.exe!ParallelCountPrimes(unsigned long n=100000000) line 300 + 0x2e byte C++
primes.exe!main(int argc=1, char * * argv=0x003a60b0) line 388 + 0x9 byte C++
primes.exe!__tmainCRTStartup() line 582 + 0x19 byte C
primes.exe!mainCRTStartup() line 399 C
==============================================================
Execution breaks on line 43 in file tbb\machine\windows_ia32.h:
inline void __TBB_rel_acq_fence() { __asm { __asm mfence } }
The error message is :
Unhandled exception at 0x1000f8c3 (tbb_debug.dll) in primes.exe: 0xC000001E: An attempt was made to execute an invalid lock sequence
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried debug under linux, GDB gave below info:
Program received signal SIGILL, Illegal instruction.
__TBB_rel_acq_fence () at ../../include/tbb/machine/linux_ia32.h:42
42 inline void __TBB_rel_acq_fence() { __asm__ __volatile__("mfence": : :"memory"); }
The stack trace is here:
================================================================
#0 __TBB_rel_acq_fence () at ../../include/tbb/machine/linux_ia32.h:42
#1 0xb7fc4961 in tbb::internal::GenericScheduler::get_task (this=0x804d600)
at ../../src/tbb/task.cpp:2569
#2 0xb7fc683c in tbb::internal::CustomScheduler<:INTERNAL::INTELSCHEDULERTRAITS>::local_wait_for_all (this=0x804d600, parent=..., child=0x8058d20)
at ../../src/tbb/task.cpp:2945
#3 0xb7fbf1fe in tbb::internal::GenericScheduler::local_spawn_root_and_wait (
this=0x804d600, first=..., next=@0x8058d1c) at ../../src/tbb/task.cpp:2519
#4 0xb7fc3855 in tbb::internal::GenericScheduler::spawn_root_and_wait (
this=0x804d600, first=..., next=@0x8058d1c) at ../../src/tbb/task.cpp:1461
#5 0x0804995a in tbb::task::spawn_root_and_wait (root=...)
at /home/ymao/tbb22_013oss/include/tbb/task.h:580
#6 0x0804a663 in tbb::internal::start_reduce
at /home/ymao/tbb22_013oss/include/tbb/parallel_reduce.h:144
#7 0x0804a5cb in tbb::parallel_reduce
body=..., partitioner=...)
at /home/ymao/tbb22_013oss/include/tbb/parallel_reduce.h:262
#8 0x080491e1 in ParallelCountPrimes (n=100000000) at primes.cpp:300
#9 0x080494f2 in main (argc=1, argv=0xbffff804) at primes.cpp:388
=======================================================
Regards!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since you build TBB from sources anyway, my recommendationis to remove this instruction from your copy of the TBB code (but otherwise leave the inlined assembly intact). Unfortunately it opens possibility for subtle races due to instruction reordering by the processor, but at least it should get you somewhere.
If you develop software intended to run on various CPUs, consider limiting the use of modified library only to processors that lack SSE2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The xchg instruction implies amemory fence even without a lock prefix. Using xchg for a memory fence should work on processors all the way back to the 8086. Though the point is moot for machines without caches.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you mean replace original __TBB_rel_acq_fence with below code? Thanks.
inline void __TBB_rel_acq_fence() {
__asm {
__asm lock xchg eax, ebx
__asm lock xchg eax, ebx
}
}
Alexey,
Can I just remove this instruction? the code is became "inline void __TBB_rel_acq_fence(){}". If so, does it introduce any BUGs?
Thanks all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No you should not remove inlined assembly and make the method a noop. Inlined assembly also serves the purpose of a compiler fence, i.e. it prevents an optimizing compiler from reordering instructions around the call.
For the purpose of memory fence, any lock-prefixed operation also succeeds. Indeed xchg is the only operation that implies a fence without specifying the lock prefix; but its disadvantage is that it requires at least one register. I guess it's the job of the compiler to save the value of the register and restore it afterwards, but also it can be avoided altogether with an operation applied to memory and immediate value. E.g. I think the following should work:
inline void __TBB_rel_acq_fence() { int tmp; __asm { __asm lock add tmp,1 } }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The xchg instruction has to have a memory operand to imply a lock prefix. Alexey's use of an explicitly locked add seems better. Perhaps "LOCK INC tmp" would be a minimalist solution.
To expand on Alexey's point about not using a noop, there are two common causes of instruction reordering:
- The hardware
- The compiler
The LOCK'd instruction prevents the hardware from reordering. The inline assembly prevents the Microsoft compiler from reordering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
LOCK ADD [R/ESP], 0
I think it has the minimal code footprint, and the location is always in cache.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page