If this is the wrong forum, I apologize - it's the closest match I could find for my question.
I'm trying to find out how many clock cycles are required for various double-precision operations, both in their simple forms, and in their SSE and (if applicable) AVX forms. For example, I'm trying to understand the relative costs of doube-precision comparisons, multiplications, and divisions for Intel's recent processors (Core 2 Duo up through i7's.)
Can anyone point me in the right direction?
Thanks very much,
BTW, nowadays the preferred method of doing scalar floating point operations is by using the SSE scalar instructions which have the same performance as their vector counterparts. Therefore vectorization normally yields a speedup of the number of entries in the vector (at least for the arithmetics).
PS: Can somebody please replace this forum software with something that works? I had to copy that URL manually because pasting only works 5% of the time...
I know you want to compare different algorithms on the same processor, and that is different than benchmarking different processors against the same algorithm, but maybe poke around some of the links to see if you can find what you want, or see other terms to search on. Pretty interesting links that I followed....