SSE and sorting

mrosenrosen · ‎02-24-2009

Is anyone aware of any sorting algorithms that use SSE. Not parallel sorting algorithms like Intel's TBB, but single threaded sorts that leverage the SSE functionality for comparison, swapping, etc.

Quoc-Thai_L_Intel · ‎02-25-2009

Quoting - mrosenrosen

Is anyone aware of any sorting algorithms that use SSE. Not parallel sorting algorithms like Intel's TBB, but single threaded sorts that leverage the SSE functionality for comparison, swapping, etc.

I could not find any single thread sorting algorithm for you but I have found a multi-threaded one. Here is the link to the paper: Efficient Implementation of Sorting on Multi-Core CPU Architecture.pdf (364.6k)

Here is the abstract of the paper:
Sorting a list of input numbers is one of the most fundamen-
tal problems in the field of computer science in general and
high-throughput database applications in particular. Al-
though literature is abound with various flavors of sorting
algorithms, different architectures call for customized im-
plementations to achieve high efficiency and faster sorting
times.
This paper presents an efficient implementation and de-
tailed analysis ofMergeSort on current CPU architectures.
Additionally, the paper demonstrates performance scalabil-
ity of proposed sorting algorithm with respect to certain
salient architectural features of modern chip multiproces-
sor (CMP) architectures, including SIMD width and core-
count. Measured performance of our algorithm on Intel Pen-
ryn quad-core processor system compares favorably with all
previously published results. Furthermore, based on our
cycle-level simulation of various hypothetical architectural
configurations, we demonstrate near-linear scalability of our
implementation with SIMD width scaling up to eight times
wider than current SSE width of 128-bits, and CMP core-
count scaling up to 32 cores.

Could you share why you only need a single thread sorting algorithm?

-Thai

TimP · ‎02-25-2009

Quoting - Quoc-thai Le (Intel)

Could you share why you only need a single thread sorting algorithm?

Or, why not SSE generated by a compiler?

levicki · ‎02-25-2009

Quoting - tim18

Or, why not SSE generated by a compiler?

Does LIBIRC have vectorized qsort()?