Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

C++: SIMD and std::vector

Scarpati_Alvarenga__
1,527 Views

Hello,

I have a software with std::vector in which there are a lot of insert and erase operations.

I would like to improve the performance of such software using SIMD, because naturally there are a lot of shifts of data for each insert/erase operation. I think that the shifts could be done in SIMD data-transfer instructions.

I would like to know the way to do this. Do I need to deal with alignment?

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
1,526 Views

Depending on the CPU model, SIMD can load and store: 16 bytes, 32 bytes of 64 bytes.

If your vector is containing POD (Plain Old Data), then it may be possible that SIMD could be used.

On the other hand, when your vector contains a more complicated object with ctor, dtor and copy constructor, then it is not likely that SIMD could be used.

Please specify, in detail, what are the entries in your vector.

Jim Dempsey

0 Kudos
Scarpati_Alvarenga__
1,526 Views

jimdempseyatthecove (Blackbelt) wrote:

Depending on the CPU model, SIMD can load and store: 16 bytes, 32 bytes of 64 bytes.

If your vector is containing POD (Plain Old Data), then it may be possible that SIMD could be used.

On the other hand, when your vector contains a more complicated object with ctor, dtor and copy constructor, then it is not likely that SIMD could be used.

Please specify, in detail, what are the entries in your vector.

Jim Dempsey

I think that it is POD:

struct BookData
{
	typedef TEntry Entry;
	double price;
	ulong orderID;
	int quantity;
	int detail;

	BookData() = default;
	BookData(const BookData &) = default;
	BookData & operator =(const BookData &) = default;

	BookData(double price, ulong orderID, int quantity, int detail) :
		price(price),
		orderID(orderID),
		quantity(quantity),
		detail(detail) { }

	BookData(const Entry & entry) :
		price(entry.entry.price),
		orderID(entry.entry.orderID),
		quantity(entry.entry.quantity),
		detail(entry.entry.detail) { }

	BookData & operator = (const Entry & entry)
	{
		price = entry.entry.price;
		orderID = entry.entry.orderID;
		quantity = entry.entry.quantity;
		detail = entry.entry.detail;
		return *this;
	}
};

I am compiling the program using:

icc -xskylake-avx512 -Ofast a.cpp -lpthread -ldl -o stdvector_icc

and with gcc:

g++ -march=skylake-avx512 -Ofast a.cpp -lpthread -ldl -o stdvector_g++

The program compiled with g++ is faster than the program compiled with icc. But it is different than the various internet benchmarks show. Maybe I did something wrong ?

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,526 Views

You could roll your own. IOW encapsulate std::vector, for example call it fastvector which performs the insert and erase using the immintrin.h intrinsics.

insert would have to determine if capacity is insufficient
    if insufficient
         allocate new buffer with sufficient size
         copy old POD up to insertion point using intrinsics (have to handle alignment issues in front and back)
         copy insertion data
         copy remaining old data
         fixup vector header
         delete old buffer
   else
         copy old POD from last entry in reverse order to insertion point to a position of (former) last entry + # to be inserted using intrinsics (have to handle alignment issues in front and back)
         copy insertion data to insertion point  using intrinsics (have to handle alignment issues in front and back)
   endif

Caution, do not use sizeof(BookData) to determine the size of the entry into the vector. Instead use (&data[1] - &data[0]). While these numbers may be the same, this will protect against the compiler inserting a/some pad byte(s) between entries to meet alignment requirements.

You can figure out the erase based on what you have to do with insert.

Jim Dempsey  

0 Kudos
Reply