Speed of MKL-DSS on Atom processor

juhar · ‎02-27-2009

Hi all,

We are a group of happy MKL users developing a math-intensive measurement device. Currently, the performance of one of our applications is mainly bounded by the speed of the direct sparse solver (MKL-DSS).

Now, we are considering to change our platform to use Intel Atom processors. My question is: roughly, how big performance disadvantage should we expect, if we do this change? For the short term, i.e., regarding the processors that are currently shipping, we could of course test this ourselves; just need to buy an Atom-based unit. But in the longer run, we'd also like to a have a clue regarding which product line might be going to develop in the relevant direction.

We would basically like to know if the performance of MKL-DSS is supposed to drop by 10%, or to one hundredth, or something between, or perhaps somewhere beyond. If it's just about reading the gigahertz values in the spec, that we can surely do ourselves, too - but we expect that there might be more to it.

At least regarding status quo this is probably a really easy question, but since we have absolutely no experience with the Atom whatsoever, it's not obvious for us. And we figured it's easier to ask, than buy the hardware and test ourselves. (Even though that would obviously give the definite answer given our matrix sizes, etc - but they are not too exotic, so I'd expect a general-purpose answer to apply to our situation, too.)

Many thanks in advance,
Juha

TimP · ‎02-27-2009

The general-purpose answer is that Atom should produce about 30% of the performance of Core 2 mobile CPU. This may depend on the code being compiled with the C compiler option for Atom, as it is a non-out-of-order CPU. You would have to consider whether cache and memory are sufficient.

juhar · ‎02-28-2009

Thanks for the answer. So, you'd expect this general-purpose information to apply to MKL-DSS as well? The solver really is the dominating bottleneck of this application (responsible of around 90% of total CPU time consumed), and we were afraid that Intel might have used much more effort to optimize this algorithm for the Pentiums than the Atom, in which case one could end up with something closer to one hundredth.

I understand that the eventual performance depends on RAM and cache, too. I'd say that RAM is not a problem, but cache could be - probably depending mostly on how clever the specific algorithm is in maximizing the hit rate.

TimP · ‎03-01-2009

No, it seems prudent to assume the MKL team didn't optimize specifically for Atom, but it also seems unlikely they didn't make an effort to worsen Atom performance. Presumably, there is an effort to make the code apply to the Larrabee GPU family, a many-core non-out-of order processor.