I thoughI could keep the processor (i7 920)busierby putting some of theadds into the dependency chain, but they all resulted in slower exectution times. Can anyone find a reason for this, or possibly get it to go even faster? Are they getting executed at the same time as instructions towards the biginning of the loop. It's quite a big leap.... I am a little surprised. This was the order I put the instructions at first glance-with the intention of rearranging themlaterfor more speed. Little did I know!
If your wondering what the code does, it is the sum of (parity(i^2) mod 2) over i where i^2can be a128bit integer.