I'm trying to find out how many clock cycles are required for various double-precision operations, both in their simple forms, and in their SSE and (if applicable) AVX forms. For example, I'm trying to understand the relative costs of doube-precision comparisons, multiplications, and divisions for Intel's recent processors (Core 2 Duo up through i7's.)

Can anyone point me in the right direction?

Thanks very much,

Christian

BTW, nowadays the preferred method of doing scalar floating point operations is by using the SSE scalar instructions which have the same performance as their vector counterparts. Therefore vectorization normally yields a speedup of the number of entries in the vector (at least for the arithmetics).

PS: Can somebody please replace this forum software with something that works? I had to copy that URL manually because pasting only works 5% of the time...

You means "Intel 64 and IA-32 Architectures Optimization Reference Manual"??

The link is changed to this.

http://intel.com/products/processor/manuals/

http://en.wikipedia.org/wiki/Benchmark_%28computing%29

I know you want to compare different algorithms on the same processor, and that is different than benchmarking different processors against the same algorithm, but maybe poke around some of the links to see if you can find what you want, or see other terms to search on. Pretty interesting links that I followed....

Quoting Nicolae Popovici (Intel)

Thanks for sharing that link! I wasn't looking for a program like this but it might be useful for me in the future :)

