Back when I was using VC++ 6, I was able to switch to the Intel compiler, set some flags and get 25%-30% gain without any additional work. Since VS2005, I have been seeing results similar to your - the Microsoft compiler has just gotten better. Additional, I find it more difficult now to get a speed gain by reworking the code with intrinsics - again the compilers have gotten better, and are doing that on their own.
At this point I find that no matter what, I need to really put in additional effort to eck out additional performance gains, but the Intel compiler can be very helpful in that area: now I tend to use the Intel specific pragmas to help guide the compiler to make more informed decisions. I find this easier than figuring out the intrinsics myself, and it helps keep the code portable. They also have profile guided optimization, but I have not tried using that.