I think section 2of this articlecorresponds to what Peter was trying to point out: You need to ensure that your application is properly threaded (application level) before you start worrying aboutCPI (architecture level). VTune can assist you in verifying this, if you measure how many of the available clockticks you are actually using. Intel Thread Profiler is another tool that can help you in this stage.
CPI is merely a measure of how well the hardware is able to execute the instruction flow. Looking at the CPI may guide you to the portions of your code where you can take better advantage of the underlying CPU architecture. However, CPI doesn't tellhow useful theexecuted instructions actually are. For example, a different algorithm might result in a way better running time -- and at the end of the day, this is what you care about, isn't it? Similarly, different instructions like vector instructions can improve your running time. If your CPI increases but your running time decreases by switching to vector instructions, who cares?
Having a high CPI just tells you, that there is room for improvement on the architectural level. It doesn't tell you that there isn't any other way to improve the application.
The value 0.75-0.5 is based on experience of what you can achieve in well-tuned CPU-bound applications. In other words, if you already have a CPI of 0.5 for a function, don't be frustrated if you cannot improve on that. On the other hand, if you have a function with a CPI of 10 and it is one of the hot functions in your application and you have exploited all the other means to improve on a system and an application level, then you should look into this.