The irony of it all is that looking back at the code made it possible to reduce the time from 5.5 seconds to 4.0 seconds with 10.0.025, and from 9.2 seconds to 5.3 seconds with 11.1.51. So overall an increase in performance of about 30% for a piece of code which I had given up on optimizing! The decreased performance for the new compiler version remeans, even though vectorization does take place now thanks to the restrict keyword and some rewriting of the code.
Thanks for the tips about the aliasing issue and the restrict keyword. I'll clean the code further and take this back to Intel support.