Does the 2.94 release addresses issues of using SDE in Windows 7 64-bit. I had tried using v1.7 a month or two back upon that OS but to no avail. Thanks for any feedback..
Hi Tim, While I don't yet test my releases on Win7 specifically, Pin (upon which SDE is built) has been tested on Win7 for a while. I am not sure the UMS stuff is supported yet in Pin (checking) -- but everything else should work. If you encounter problems, please let us know.
In SDE 1.13 the FMA 4 implementation is supported, but I've found some problems. First the 4 operand assembly output in mix.out instruction distributions has the middle 2 of the 4 operands swizzled. Lastly, the memory forms of the 4 operand FMAC instructions generates inaccurate results. I don't know if you're aware of this.. but wanted to forward it along anyways.
BTW.. is there any plan to "re-support" 4 operand FMAC in SDE in the future? A flag to enable it maybe?
I'm sure you realize that one gets different answers with MUL-THEN-ADD then one gets with a fused mul add because of the lack of an internal rounding step in the latter case. Assuming that is not the issue you are encountering: Therewas one numerical accuracy bug that I fixed in the FMA implementation. One of our internal teams noticed it. That fix is present in the 2.94 release for the 3-operand FMA. It is same FMA emulation routine that I used in the earlier 4-operand FMA implementation in the older SDE release you are using.
The 4-operand FMA is not currently supported in Intel SDE. I would be interested in hearing if you found any numerical accuracy issues in the current release. Or just send along your test case inputs / outputs and I'll take a quick look.
I cannot comment on the 4-operand FMA situation. Intel is indicating support for a 3-operand FMA in a future implementation.
While developing ACML's dgemm and FMA implementation I noted that memory forms of 4 operand FMA were inaccurate upon checking matrix norms coming out of the routine. I also observed this in FMA4 implementations in SDE 1.13 for L1 BLAS routines I wrote. It's also failed on running SPEC06 binaries built with AVX enabled compilers. I'll undoubtedly run the latest intel binaries as well as those from other compilers through the latest SDE utility in the near future. Thanks for the prompt reply Mark and have a great day..