I beg to differ. AVX2 largely overlaps with LRBni, and can be extended up to 1024-bit operations. The latter property can be used to execute them in four cycles on 256-bit units, offering great opportunity for clock gating and thus massively reducing the power consumption of the out-of-order execution logic.
In other words, future high-end desktop CPUs could closely compete with the MICs in terms of throughput and power consumption.
CPUs have unique advantages though: out-of-order execution allows to keep a low number of threads, which means each of them get a large share of cache space. This improves the hit rates, which in turn lowers bandwidth needs and power consumption higher up. Lots of workloads also have large portions of serial code, for which the high frequency and low latency of the CPU is superior. So even with a lower peak peformance, the CPU can outperform other architectures in practice.
The MICs will fend off GPUs in the HPC market for a while, but in the longer term a CPU with AVX-1024 will be an all-round better option.