is std:sort faster than ipp sort ippsSortDescend_64f_I

Alok_J_ · ‎08-26-2015

hi,

I am trying to benchmark std::sort on vectors of double vs ipp sort using (ippsSortDescend_64f_I). I am sorting 200 vectors of length 2000 elements each. My test program is attached in this post. I see that std::sort is consistently performing better ( at least 10 times faster) than ipp sort. Is this expected behavior? I would prefer to move to ipp assuming its better in performance but i am not able to prove it. What am I doing wrong in my test program?

#include <array>
#include <iostream>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timeb.h>

#include <vector>
#include <chrono>

#include "ipp.h"

using namespace std;

const int SIZE = 2000000;
const int ITERS = 200;

//Chrono typedefs
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::microseconds microseconds;

//////////////////////////////////// std ///////////////////////////////////

typedef vector<double> myList;

void initialize(myList & l, Ipp64f* ptr)
{
    double randomNum;
    for (int i = 0; i < SIZE; i++)
    {
        randomNum =  1.0 * rand() / (RAND_MAX / 2) - 1;
        l.push_back(randomNum);
        				ptr = randomNum;
    }
}

void test_sort()
{
    		array<myList, ITERS> list;
    		array<Ipp64f*, ITERS> ippList;

    		// allocate
    for(int i=0; i<ITERS;i++)
    {
      				list.reserve(SIZE);
      				ippList = ippsMalloc_64f(SIZE);
    		}

    		// initialize
    		for(int i=0;i<ITERS;i++)
    		{
        				initialize(list, ippList);
    		}
    
    		cout << "\n\nTest Case 1: std::sort\n";
    cout << "========================\n";

    		// sort vector
    		Clock::time_point t0 = Clock::now();
    for(int i=0; i<ITERS;i++)
    {
        std::sort(list.begin(), list.end());
    }
    		Clock::time_point t1 = Clock::now();
    		microseconds ms = std::chrono::duration_cast<microseconds>(t1 - t0);
    		std::cout << ms.count() << " micros" << std::endl;

    		////////////////////////////////// IPP ////////////////////////////////////////

   		 cout << "\n\nTest Case 2: ipp::sort\n";
    cout << "========================\n";
    
    		// sort ipp 
    		Clock::time_point t2 = Clock::now();
    		for(int i=0; i<ITERS;i++)
    {
      				ippsSortAscend_64f_I(ippList, SIZE);
    }
    		Clock::time_point t3 = Clock::now();
    		microseconds ms1 = std::chrono::duration_cast<microseconds>(t3 - t2);
    		std::cout << ms1.count() << " micros" << std::endl;

    		for(int i=0; i<ITERS;i++)
    		{
	        ippsFree( ippList );
    		}
}


///////////////////////////////////////////////////////////////////////////////////////

int main()
{
    srand (time(NULL));

    cout << "Test for sorting an array of structures.\n" << endl;
    cout << "Test case: \nSort an array of structs ("<<ITERS<<" iterations) with double of length "<<SIZE<<". \n";
    		IppStatus status=ippInit();
    		test_sort();
    return 0;
}

///////////////////////////////////////////////////////////////////////////////////////

Victor_D_ · ‎08-27-2015

Reading thru the code, it seems correct. One thing to double check is that the two algorithms are coming up with the same answers, since the inputs are the same. Another is to double-check that the resulting arrays are truly sorted, as well as the inputs are random. Lastly, try ipp radix sort, which should work on floating-point numbers (but double check that by checking that the results are sorted), in my past benchmarks it beat STL sort by about 3X.

Alok_J_ · ‎08-27-2015

Thanks for your response Victor. I had already displayed and checked that the vectors are initialized properly and they are sorted. The outputs from std::sort and ipp sort are matching. As per your suggestion, i changed to Radix sort and surprise... intel ipp sort was faster this time. Here is my output. I see an improvement of 2x though I was expecting more, but this is good enough proof. That raises questions, why regular sort is slower than std sort? My assumption was that ipp is generally better than std for performance. Guess this is not the case.

Test for sorting an array of structures.

Test case:
Sort an array of structs (20000 iterations) with double of length 2000.


Test Case 1: std::sort
========================
average time 100 micros


Test Case 2: ipp::sort
========================
average time 50 micros

Victor_D_ · ‎08-27-2015

Nice work! Several benchmarks from an older version of IPP http://www.drdobbs.com/architecture-and-design/algorithm-improvement-through-performanc/221600153?pgno=5

confirm your measurements as well (see table 11 for 32-bit sorting) where Intel sort is slower than STL sort and IPP Radix sort is faster. Sorting in general is not a computationally limited operation, but performance is purely limited by memory accesses, where SSE/SIMD instruction don't help much. Radix sort changes the internal algorithm completely and that's the reason for the win.

Alok_J_ · ‎08-27-2015

good article. Thanks.

Thomas_W_Intel · ‎08-28-2015

I'm moving this thread to the IPP forum, where it is more likely it will catch the attention of IPP support.