- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually, we did not try it with g++. The whole our application is based on SunStudio and it is why we try to make it with studio compiler. Can you reference someone who was successful using SunStudio for Sparc? Thanks.
Hi,
I've gotten a 'successful' 32 bit build on sparc but this is not proven. I just haven't had that much time to confirm that it's ok.
I was going to wait till I had a cleaner set of changes to post but since there is some interest I'll attach my changes here to get you started. You should do a diff against tbb22_004oss to check everything I changed.
hope this helps,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oddly this is not mentioned in the compiler man page or user guide http://docs.sun.com/app/docs/doc/820-7599/bkaed?l=en&a=view&q=extensions
I haven't had a chance to try on an IA-32 box yet but it doesn't seem to work for me on sparc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(silly question deleted) ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe I have now solved the alignment problem or at least I have a solution for one instance of alignment problems and maybe a suggestion for the Intel engineers.
To recap, compiling for 32 bit (-m32) I was getting assertions at the bolded line in the following code...
concurrent_queue_base_v3::concurrent_queue_base_v3( size_t item_size ) {
items_per_page = item_size<=8 ? 32 :
item_size<=16 ? 16 :
item_size<=32 ? 8 :
item_size<=64 ? 4 :
item_size<=128 ? 2 :
1;
my_capacity = size_t(-1)/(item_size>1 ? item_size : 2);
my_rep = cache_aligned_allocator
__TBB_ASSERT( (size_t)my_rep % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->head_counter % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->tail_counter % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->array % NFS_GetLineSize()==0, "alignmenterror" );
As previously mentioned I guessed this was something to do with differing natural alignment on sparc versus intel. As it turns out this was exactly the problem. In the class tbb::internal::concurrent_queue_rep the first member, head_counter, is padded with 4 bytes on sparc but not padded on intel. This throws out the calculation for the size of the pad1 and pad2 arrays by 4 bytes.
Changing the calculation to the following will fix the problem and tail_counter etc will now be aligned to NFS_MaxLineSize...
char pad1[NFS_MaxLineSize-(((sizeof(atomic
My suggestion, or rather request, is if it's possible that the Intel engineers could make changes to this class and any others they would know will be affected to account for different alignments.
Maybe with all the info from this thread Sun could now get an engineer to take the info from this thread and do an official port? There are some other Sun CC c++ incompatibilities in the test suite that need to be addressed.
Anyway, I feel confident this is now a working port. At least the concurrent queues ;)
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Portable padding... not so easy. First thing to be aware of: char arrays don't always work (because they're not allowed to be empty in case there is no padding), so it seems a lot better to always use unnamed bit fields (these may be any nonnegative length, including zero and longer than the base type). Secondly, adding field sizes doesn't work, of course, i.e., in case of any padding between the previous fields. Since fields are laid out in order, you could have a phantom definition with just the previous fields, additionally waste a static allocation, and directly determine the offset of the last field. Alternatively (for padding to a large-enough power of 2 anyway, which happens to be the main or perhaps even exclusive goal of padding in TBB), you might pack the previous fields in a struct that is used as a base (no notion of "packing tightly" intended), and then use the size of that struct to work with, as has been done, e.g., in MemoryAllocator.cpp.
(Added 2009-09-02) I would also still want to verify the result at launch time at the latest: this may be more portable, but it is still not guaranteed to work on all imaginable implementations, or at least I have found no supporting evidence for that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good point's, Raf.
I guess that just leaves separate pre-processor controlled blocks for each platform/target.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://www.cis.upenn.edu/acg/tbb-sparc/
I developed and used it this summer as part of summer research project and have had no problems with it.
The patch is minimal and a shell script is included to download the TBB source and apply the patch for those who don't want to get that involved.
Feedback is well appreciated, as I have used this in a relatively contained environment and would like to see how it works for other people. I will do my best to keep it up to date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://www.cis.upenn.edu/acg/tbb-sparc/
I developed and used it this summer as part of summer research project and have had no problems with it.
The patch is minimal and a shell script is included to download the TBB source and apply the patch for those who don't want to get that involved.
Feedback is well appreciated, as I have used this in a relatively contained environment and would like to see how it works for other people. I will do my best to keep it up to date.
Disassemble code from DBX:
0xff21f008: _Z23__TBB_machine_fetchadd8PVvx : save %sp, -0x80, %sp
0xff21f00c: _Z23__TBB_machine_fetchadd8PVvx+0x0004: st %i0, [%fp + 0x44]
0xff21f010: _Z23__TBB_machine_fetchadd8PVvx+0x0008: st %i1, [%fp - 0x18]
0xff21f014: _Z23__TBB_machine_fetchadd8PVvx+0x000c: st %i2, [%fp - 0x14]
0xff21f018: _Z23__TBB_machine_fetchadd8PVvx+0x0010: ld [%fp + 0x44], %i0
0xff21f01c: _Z23__TBB_machine_fetchadd8PVvx+0x0014: ld [%fp + 0x44], %i1
0xff21f020: _Z23__TBB_machine_fetchadd8PVvx+0x0018: ld [%fp + 0x44], %g1
0xff21f024: _Z23__TBB_machine_fetchadd8PVvx+0x001c: ld [%g1], %i2
0xff21f028: _Z23__TBB_machine_fetchadd8PVvx+0x0020: ld [%g1 + 0x4], %i3
0xff21f02c: _Z23__TBB_machine_fetchadd8PVvx+0x0024: ldd [%fp - 0x18], %i4
0xff21f030: _Z23__TBB_machine_fetchadd8PVvx+0x0028: ld [%fp + 0x44], %g1
0xff21f034: _Z23__TBB_machine_fetchadd8PVvx+0x002c: add %i2, %i4, %o4
0xff21f038: _Z23__TBB_machine_fetchadd8PVvx+0x0030: bad opcode
0xff21f03c: _Z23__TBB_machine_fetchadd8PVvx+0x0034: cmp %i2, %o4
0xff21f040: _Z23__TBB_machine_fetchadd8PVvx+0x0038: bad opcode
0xff21f044: _Z23__TBB_machine_fetchadd8PVvx+0x003c: mov %o4, %i2
0xff21f048: _Z23__TBB_machine_fetchadd8PVvx+0x0040: mov %o4, %i4
0xff21f04c: _Z23__TBB_machine_fetchadd8PVvx+0x0044: mov %o5, %i5
0xff21f050: _Z23__TBB_machine_fetchadd8PVvx+0x0048: std %i4, [%fp - 0x20]
0xff21f054: _Z23__TBB_machine_fetchadd8PVvx+0x004c: ldd [%fp - 0x20], %i4
0xff21f058: _Z23__TBB_machine_fetchadd8PVvx+0x0050: mov %i4, %i0
0xff21f05c: _Z23__TBB_machine_fetchadd8PVvx+0x0054: mov %i5, %i1
0xff21f060: _Z23__TBB_machine_fetchadd8PVvx+0x0058: rett %i7 + 0x8
0xff21f064: _Z23__TBB_machine_fetchadd8PVvx+0x005c: nop
The equivalent gcc assembler code is:
This is from include/tbb/machine/sunos_sparc.h
/**
* Atomic fetch and add for 64 bit values, in this case implemented by continuously checking success of atomicity
* @param ptr pointer to value to add addend to
* @param addened value to add to *ptr
* @return value at ptr before addened was added
*/
static inline int64_t __TBB_machine_fetchadd8(volatile void *ptr, int64_t addend){
int64_t result;
__asm__ __volatile__ (
"0:t addt %3, %4, %0n" // do addition
"t casxt [%2], %3, %0n" // cas to store result in memory
"t cmpt %3, %0n" // check if value from memory is original
"t bne,a,pnt %%xcc, 0bn" // if not try again
"t mov %0, %3n" // use branch delay slot to move new value in memory to be added
: "=&r"(result), "=m"(*(int64_t *)ptr)
: "r"(ptr), "r"(*(int64_t *)ptr), "r"(addend), "m"(*(int64_t *)ptr)
: "ccr", "memory");
return result;
}
Instructions CASX & BNE were not given proper opcodes by the gcc compiler.
Any suggestions or insights?
Thanks,
Sreeni
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page