Here is the abstract of the paper: Sorting a list of input numbers is one of the most fundamen- tal problems in the field of computer science in general and high-throughput database applications in particular. Al- though literature is abound with various flavors of sorting algorithms, different architectures call for customized im- plementations to achieve high efficiency and faster sorting times. This paper presents an efficient implementation and de- tailed analysis ofMergeSort on current CPU architectures. Additionally, the paper demonstrates performance scalabil- ity of proposed sorting algorithm with respect to certain salient architectural features of modern chip multiproces- sor (CMP) architectures, including SIMD width and core- count. Measured performance of our algorithm on Intel Pen- ryn quad-core processor system compares favorably with all previously published results. Furthermore, based on our cycle-level simulation of various hypothetical architectural configurations, we demonstrate near-linear scalability of our implementation with SIMD width scaling up to eight times wider than current SSE width of 128-bits, and CMP core- count scaling up to 32 cores.
Could you share why you only need a single thread sorting algorithm?