Have you tried export MKL_DYNAMIC=TRUE
It will suggest MKl to choose the good the threading number for the problem. As Tim noted, for the DGEMV, DDOT function, increasing the threading number may not improve the performance. If MKL_DYNAMIC is FALSE, it will force MKL to the threading you set.
BTW, If Hyper-Threading technology is enabled on the systems, it is recommended that the threading numbers be set equal to the number of real processors or cores. That is only half number of the logical processors.