Presumably, you're using MKL which gains a big advantage from AVX on the newer CPU.
Maybe you didn't check out settings which might optimize thread placement across the 4 cache channels of the Westmere CPU, e.g. KMP_AFFINITY="proclist=[1,3-5],verbose,explicit" with MKL_NUM_THREADS=4 and HyperThreading disabled (just guessing, according to a typical BIOS numbering of the cores). I suppose, with HT enabled, it might be proclist=[3,7,9,11] or some such. HT wasn't designed for simplicity, nor was there a consistent pattern in early BIOS for it.
If you did so, some specifics of your results might help you get more intelligent responses. The quirky cache setup of the 6-core Westmere probably contributes to limited popularity (and the fact that I have one left to me as a retirement gift).
Also, you should check the MKL docs for any recommendations such as 32-byte data alignment (at least as important on older core-I7 CPUs as the newer one).