By the way, frankly I believe there are more important ISA issues than a few enodingdifferences between vendors. Inferior extensions quickly become irrelevant and affect only a fraction of the decoder logic, while superior extensions influence the entire architecture. For instance SSE started out with only two 64-bit execution ports. Nowadays we have three 128-bit execution ports.
Some of the critical things that involve the ISA are parallel data gather/scatter, and transactional memory. As vectorsbecome wider, getting data in or out from different memory locations becomes a significant bottleneck. AVX with FMA can perform up to16 single-precision floating-point operations each clock cycle, but reading and writing each element individually would take64 clock cycles.And as the number of cores increases acquiring locks becomes harder. Even performing just a couple of reads and writes atomically is painstakingly slow. So we could use some hardware support to speed things up.