The comparison is especially

Nosh_N_ · ‎11-06-2015

Hi,

I just read on Wikipedia that an IBM 1620 took 17ms to multiple two integers, and I was wondering how long a modern CPU takes to execute the same operation.

I hope I'm in the right forum. I found this question from 2008 ( https://software.intel.com/en-us/forums/intel-academic-community-forum/topic/299987 ), which, going by Google, seems to suggest that I should ask my question here.

Regardless, I'm looking forward to your answers.

MarkC_Intel · ‎11-06-2015

Good question, but hard to answer without knowing more. There are many kinds of integer and many ways of doing multiplication of integers in our chips. You need to consider both the latency and throughput and you need to consider if you are using SIMD instructions to multiple multiplies in one instruction. If you want to see the data we publish for the latency and throughput for each kind of instruction, see appendix C of the software optimization guide available on http://www.intel.com/sdm . (updated to fix url)

McCalpinJohn · ‎11-06-2015

The comparison is especially difficult because the IBM 1620 was a variable word-length system. Fixed-point numbers could be anywhere from 2 decimal digits to 10's of thousands of decimal digits (depending on the machine size).

One reference stated that the system could multiply two 10-digit numbers in 17.7 milliseconds. 10 digits is slightly larger than what can be held in a 32-bit integer, but is very easily held in a 64-bit integer, so multiplication of two 64-bit binary numbers seems like a fair comparison.

According to appendix C of the Intel Optimization Reference Manual, the "MUL" instruction can multiply two 64-bit (unsigned) values and put the 128-bit result in two 64-bit output registers with a latency of 4 cycles on recent processors (Nehalem/Westmere, Sandy Bridge/Ivy Bridge, and Haswell/Broadwell). At a "typical' frequency of 2 GHz, this is a latency of 2 nanoseconds -- almost 9 million times faster than the IBM.

Unlike the IBM 1620, modern processors can also perform many of these multiplication operations concurrently, using pipelining, SIMD vectors, and multiple cores.

Pipelining: All recent Intel processors can issue one "MUL" instruction every cycle, so the throughput is four times higher if you have four independent operations that can be launched consecutively.
SIMD Vectors: There are several approaches that can be used here, depending on the data layout and the application's requirements. With 256-bit vector instructions it should be possible to get at least a 2x improvement in throughput.
Multiple Cores: All of the cores can run independent integer multiplications concurrently.

Combining these factors gives a (peak theoretical) throughput increase of at least an additional factor of ~32x on a quad-core processor. Whether this can be sustained depends on where the input and output data is located in memory, whether the code is using signed or unsigned integers, whether the code needs the full 128-bit output or just the low-order 64 bits, etc....

Overall, a relatively inexpensive quad-core processor should be between 10 million and 300 million times as fast as the IBM 1620 for the arithmetic operation. Time required for memory accesses will likely reduce these ratios, since memory has not increased in performance as much as the computational logic has increased in performance and available concurrency.

Bernard · ‎12-15-2015

It is not fair comparison. IBM 1620 represented Hardware from around beginning of 1960's. Its CPU has clock speed of 1 Mhz. In terms of very simplistic and crude comparison the CPU clock differences were 2.0 - 3.0 * 1.0e3 less when compared to modern CPU.Wikipedia article states that IBM 1620 CPU did not even have an ALU unit.

How long does a 6700K take to multiply two integers?