>>Third did you try to partially specialize TDataSet class for SSE and for AVX data types by using manual
>>vectorization?
Partially Yes. But, there are so many Intel Intrinsic domains, like MMX, SSE, SSE2, SSE4, AVX, AVX2, etc, that full support of all these Intrinsic domains is absolutely useless. For example, nobody is interested in MMX or SSE at the moment.
Also, a current state of Intel Intrinsic domains I would consider as a very messy and there are lots of inconsistencies. In my thread devoted to Intergration of Watcom C++ compiler I've already expressed my point of view that direct usage of Intel Intrinsics does not solve all performance problems.