That's hardly relevant. You have to look at the whole picture. Sandy Bridge is significantly faster and power efficient.
Note that L3 cache latency went down considerably. Increasing L2 cache latency by a bit must have been a compromise which resulted in a better balance between all of the parameters. Increasing the instruction latency probably allowed them to use relatively slower but more power efficient transistors. Overall it's still a win on all fronts.
Let me put it this way: If a racecar became faster by replacing the engine with a much lighter one with slightly less power, would you consider that "screwing up"? Only external parameters matter in the end.