Yes, there are several production codes as well as demonstrations of Intel MPI usage with Intel OpenMP as the MPI FUNNELED implementation. Currently supported versions of Intel MPI default to using the OMP_NUM_THREADS to optimize thread placement on each node. MKL works well in both the 1 MPI process per node and 1 process per socket modes.
Intel MKL from version 10.2 and Intel MPI Library from version 4.0 may balance workload on a node. Number of MPI processes plus number of MKL threads will not exceed number of cores on a node. You don't even need to use OMP_NUM_THREADS or MKL_NUM_THREADS.