- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
...
double dValueA = 55.55L;
double dValueB = 77.77L;
...
What is a fastest methodin assemblerto exchange values ofthese two double-precision variables?
Best regards,
Sergey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you doing x87 math or SSE2 math?
Is this showing up as a bottleneck?
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you doing x87 math or SSE2 math?
[SergeyK]x87 - Yes ( this is because a solution has to be highlyportable )
SSE2 solution also could be considered.
Is this showing up as a bottleneck?
[SergeyK] Yes, and I need to make the exchange in as fastest as possible way.
...
I'd like to provide some technical details. I don't need this to do the math butI need to use it inseveral sorting algorithms, like MergeSort, QuickSort, etc,
in cases when 'double' data types are used.
In ageneric form it looks like:
Here is a solution I currently implemented:
Thesolution with FLD-FSTP instructions is~1.6x fasterand it improves performance of sorting algorithms.
Is it possible to make the exchange faster?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] Nothing, but my toppriority is optimization of source codes in the first place.
It means thatcodes must be highlyoptimized at a C/C++ level,sometimes with inline assembler, and
I can't rely all the time onoptimizations of aC/C++ compiler.
Are you trying to include the cases of misaligned data? If not, wouldn't 128-bit parallel moves be preferable?
[SergeyK] Could you provide more technical details with an example?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thesolution with FLD-FSTP instructions is~1.6x fasterand it improves performance of sorting algorithms.
Is it possible to make the exchange faster?
I've done a set of tests with 'Load-Shuffle-Store' intrinsic functions, like
but it is not as fast as 'Fld-Fstp' based exchange. Finalrelative resultsof my tests are as follows:
Generic basedExchange- ~1.5x slower than Fld-Fstp
Fld-Fstp basedExchange - 1.0x
Shuffle basedExchange - ~2.5x slower than Fld-Fstp
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page