Re: TBB on Solaris for SPARC

swiftj · ‎02-24-2009

Hi,

I asked about this on the blog but here will probably be more appropriate.

( This blog posthttp://software.intel.com/en-us/blogs/2009/01/22/sun-intel-opensolaris-2-years-the-year-of-core/#comment-20289 )

Raf, it was suggested that you have a patch for Solais on SPARC, not just ia32 and Intel 64.

I can't find anything on the Sun site and have asked there too.

http://forums.sun.com/thread.jspa?forumID=850&threadID=5294189

any info greatly appreciated.

kind regards,

James

jswift2 · ‎08-25-2009

Quoting - okalyta

Actually, we did not try it with g++. The whole our application is based on SunStudio and it is why we try to make it with studio compiler. Can you reference someone who was successful using SunStudio for Sparc? Thanks.

Hi,

I've gotten a 'successful' 32 bit build on sparc but this is not proven. I just haven't had that much time to confirm that it's ok.

I was going to wait till I had a cleaner set of changes to post but since there is some interest I'll attach my changes here to get you started. You should do a diff against tbb22_004oss to check everything I changed.

hope this helps,

James

jswift2 · ‎08-25-2009

Quoting - Alexey Kukanov (Intel)

Just for your information: built-in support for Sun Studio compiler in TBB relies on __attribute__((aligned(X))) being properly handled. As far as I know, it works well in Sun Studio Express and Sun Studio 12 Update 1 - at least, for IA-32 and Intel 64 architectures.

Oddly this is not mentioned in the compiler man page or user guide http://docs.sun.com/app/docs/doc/820-7599/bkaed?l=en&a=view&q=extensions

I haven't had a chance to try on an IA-32 box yet but it doesn't seem to work for me on sparc.

jswift2 · ‎08-26-2009

(silly question deleted) ;)

jswift2 · ‎08-26-2009

I believe I have now solved the alignment problem or at least I have a solution for one instance of alignment problems and maybe a suggestion for the Intel engineers.

To recap, compiling for 32 bit (-m32) I was getting assertions at the bolded line in the following code...

concurrent_queue_base_v3::concurrent_queue_base_v3( size_t item_size ) {
items_per_page = item_size<=8 ? 32 :
item_size<=16 ? 16 :
item_size<=32 ? 8 :
item_size<=64 ? 4 :
item_size<=128 ? 2 :
1;
my_capacity = size_t(-1)/(item_size>1 ? item_size : 2);
my_rep = cache_aligned_allocator().allocate(1);
__TBB_ASSERT( (size_t)my_rep % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->head_counter % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->tail_counter % NFS_GetLineSize()==0, "alignmenterror" );
__TBB_ASSERT( (size_t)&my_rep->array % NFS_GetLineSize()==0, "alignmenterror" );

As previously mentioned I guessed this was something to do with differing natural alignment on sparc versus intel. As it turns out this was exactly the problem. In the class tbb::internal::concurrent_queue_rep the first member, head_counter, is padded with 4 bytes on sparc but not padded on intel. This throws out the calculation for the size of the pad1 and pad2 arrays by 4 bytes.

Changing the calculation to the following will fix the problem and tail_counter etc will now be aligned to NFS_MaxLineSize...

char pad1[NFS_MaxLineSize-(((sizeof(atomic)+4)+sizeof(waitvar_t)+sizeof(mutexvar_t)+sizeof(size_t)+sizeof(uint32_t))&(NFS_MaxLineSize-1))];

My suggestion, or rather request, is if it's possible that the Intel engineers could make changes to this class and any others they would know will be affected to account for different alignments.

Maybe with all the info from this thread Sun could now get an engineer to take the info from this thread and do an official port? There are some other Sun CC c++ incompatibilities in the test suite that need to be addressed.

Anyway, I feel confident this is now a working port. At least the concurrent queues ;)

James

RafSchietekat · ‎08-27-2009

"My suggestion, or rather request, is if it's possible that the Intel engineers could make changes to this class and any others they would know will be affected to account for different alignments."
Portable padding... not so easy. First thing to be aware of: char arrays don't always work (because they're not allowed to be empty in case there is no padding), so it seems a lot better to always use unnamed bit fields (these may be any nonnegative length, including zero and longer than the base type). Secondly, adding field sizes doesn't work, of course, i.e., in case of any padding between the previous fields. Since fields are laid out in order, you could have a phantom definition with just the previous fields, additionally waste a static allocation, and directly determine the offset of the last field. Alternatively (for padding to a large-enough power of 2 anyway, which happens to be the main or perhaps even exclusive goal of padding in TBB), you might pack the previous fields in a struct that is used as a base (no notion of "packing tightly" intended), and then use the size of that struct to work with, as has been done, e.g., in MemoryAllocator.cpp.

(Added 2009-09-02) I would also still want to verify the result at launch time at the latest: this may be more portable, but it is still not guaranteed to work on all imaginable implementations, or at least I have found no supporting evidence for that.

jswift2 · ‎08-27-2009

Good point's, Raf.

I guess that just leaves separate pre-processor controlled blocks for each platform/target.

michajlo · ‎09-16-2009

I have developed a patch for TBB 2.2 for SPARC with some guidance from Raf's atomic patch. It is available at:

http://www.cis.upenn.edu/acg/tbb-sparc/

I developed and used it this summer as part of summer research project and have had no problems with it.

The patch is minimal and a shell script is included to download the TBB source and apply the patch for those who don't want to get that involved.

Feedback is well appreciated, as I have used this in a relatively contained environment and would like to see how it works for other people. I will do my best to keep it up to date.

sreenu30 · ‎11-17-2009

Quoting - michajlo

I have developed a patch for TBB 2.2 for SPARC with some guidance from Raf's atomic patch. It is available at:

http://www.cis.upenn.edu/acg/tbb-sparc/

I developed and used it this summer as part of summer research project and have had no problems with it.

The patch is minimal and a shell script is included to download the TBB source and apply the patch for those who don't want to get that involved.

Feedback is well appreciated, as I have used this in a relatively contained environment and would like to see how it works for other people. I will do my best to keep it up to date.

I used this patch and compiled using g++(ver 3.4.6) on SunOS (kernel ver 5.8) on a sparc architecture. I got a BUS Error with the library. looking deeper I found out the following.

Disassemble code from DBX:

0xff21f008: _Z23__TBB_machine_fetchadd8PVvx : save %sp, -0x80, %sp
0xff21f00c: _Z23__TBB_machine_fetchadd8PVvx+0x0004: st %i0, [%fp + 0x44]
0xff21f010: _Z23__TBB_machine_fetchadd8PVvx+0x0008: st %i1, [%fp - 0x18]
0xff21f014: _Z23__TBB_machine_fetchadd8PVvx+0x000c: st %i2, [%fp - 0x14]
0xff21f018: _Z23__TBB_machine_fetchadd8PVvx+0x0010: ld [%fp + 0x44], %i0
0xff21f01c: _Z23__TBB_machine_fetchadd8PVvx+0x0014: ld [%fp + 0x44], %i1
0xff21f020: _Z23__TBB_machine_fetchadd8PVvx+0x0018: ld [%fp + 0x44], %g1
0xff21f024: _Z23__TBB_machine_fetchadd8PVvx+0x001c: ld [%g1], %i2
0xff21f028: _Z23__TBB_machine_fetchadd8PVvx+0x0020: ld [%g1 + 0x4], %i3
0xff21f02c: _Z23__TBB_machine_fetchadd8PVvx+0x0024: ldd [%fp - 0x18], %i4
0xff21f030: _Z23__TBB_machine_fetchadd8PVvx+0x0028: ld [%fp + 0x44], %g1
0xff21f034: _Z23__TBB_machine_fetchadd8PVvx+0x002c: add %i2, %i4, %o4
0xff21f038: _Z23__TBB_machine_fetchadd8PVvx+0x0030: bad opcode
0xff21f03c: _Z23__TBB_machine_fetchadd8PVvx+0x0034: cmp %i2, %o4
0xff21f040: _Z23__TBB_machine_fetchadd8PVvx+0x0038: bad opcode
0xff21f044: _Z23__TBB_machine_fetchadd8PVvx+0x003c: mov %o4, %i2
0xff21f048: _Z23__TBB_machine_fetchadd8PVvx+0x0040: mov %o4, %i4
0xff21f04c: _Z23__TBB_machine_fetchadd8PVvx+0x0044: mov %o5, %i5
0xff21f050: _Z23__TBB_machine_fetchadd8PVvx+0x0048: std %i4, [%fp - 0x20]
0xff21f054: _Z23__TBB_machine_fetchadd8PVvx+0x004c: ldd [%fp - 0x20], %i4
0xff21f058: _Z23__TBB_machine_fetchadd8PVvx+0x0050: mov %i4, %i0
0xff21f05c: _Z23__TBB_machine_fetchadd8PVvx+0x0054: mov %i5, %i1
0xff21f060: _Z23__TBB_machine_fetchadd8PVvx+0x0058: rett %i7 + 0x8
0xff21f064: _Z23__TBB_machine_fetchadd8PVvx+0x005c: nop

The equivalent gcc assembler code is:

This is from include/tbb/machine/sunos_sparc.h

/**
* Atomic fetch and add for 64 bit values, in this case implemented by continuously checking success of atomicity
* @param ptr pointer to value to add addend to
* @param addened value to add to *ptr
* @return value at ptr before addened was added
*/
static inline int64_t __TBB_machine_fetchadd8(volatile void *ptr, int64_t addend){
int64_t result;
__asm__ __volatile__ (
"0:t addt %3, %4, %0n" // do addition
"t casxt [%2], %3, %0n" // cas to store result in memory
"t cmpt %3, %0n" // check if value from memory is original
"t bne,a,pnt %%xcc, 0bn" // if not try again
"t mov %0, %3n" // use branch delay slot to move new value in memory to be added
: "=&r"(result), "=m"(*(int64_t *)ptr)
: "r"(ptr), "r"(*(int64_t *)ptr), "r"(addend), "m"(*(int64_t *)ptr)
: "ccr", "memory");
return result;
}

Instructions CASX & BNE were not given proper opcodes by the gcc compiler.

Any suggestions or insights?

Thanks,
Sreeni