I have recently gained access to a 128 processor Itanium2 system (SGI Altix 3000 series), which I will use for benchmarking YetiSim and development.
I would like to know if I will encounter obstacles using an Itanium2 system vs. a regular multicore system. Specifically can I expect the same performance from TBB as on other architectures? Will TBB to start to break down with this large number of processors?
Actually, I found out from Arch that they have a 32-way Altix in their laband have alreadydone some experiments. The sample primes code scaled well; he tested it out to about a trillion (he didn't specify British or US) :-)
Arch did further caution that many of the other examples may fall flat without careful attention to cache affinity issues. You'll definitely want to use the affinity partitioner.
Beyond that, let us know what you find out. Good luck.
Thanks for the advice. I am not yet ready to execute my trials, I am currently abstracting my existing components for contribution to TBB as generic constructs... then rewriting my own source code to use the new generic components...
...I will be prepared to execute some speed trials in a week, and will report back here with a chart of results.