I am running currently using Openblas libraries on a dual-processor Zeon (E5-2680) server with 4*8GB RAM, and the performance is worse than that of a core I7 (generation 3) based PC with 32 GB RAM running Openblas. I am interested in improving BLAS performance by using MKL-BLAS, and would like to know how to install and configure this for performance.
Mkl is included in intel compiler installers, and also available from the mkl community download site. It comes with full documentation. If you have further questions, the companion mkl forum site might be appropriate.
From what little you have said, one might question whether you gave attention to your settings of omp_num_threads, omp_places, and omp_proc_bind. Mkl default for Xeon is omp_places=cores.
If you compare avx2 vs. Avx, evidently you might expect a large difference in performance of some blas functions.