You could roll your own. IOW

Scarpati_Alvarenga__ · ‎07-15-2019

Hello,

I have a software with std::vector in which there are a lot of insert and erase operations.

I would like to improve the performance of such software using SIMD, because naturally there are a lot of shifts of data for each insert/erase operation. I think that the shifts could be done in SIMD data-transfer instructions.

I would like to know the way to do this. Do I need to deal with alignment?

jimdempseyatthecove · ‎07-17-2019

Depending on the CPU model, SIMD can load and store: 16 bytes, 32 bytes of 64 bytes.

If your vector is containing POD (Plain Old Data), then it may be possible that SIMD could be used.

On the other hand, when your vector contains a more complicated object with ctor, dtor and copy constructor, then it is not likely that SIMD could be used.

Please specify, in detail, what are the entries in your vector.

Jim Dempsey

Scarpati_Alvarenga__ · ‎07-17-2019

jimdempseyatthecove (Blackbelt) wrote:
Depending on the CPU model, SIMD can load and store: 16 bytes, 32 bytes of 64 bytes.
If your vector is containing POD (Plain Old Data), then it may be possible that SIMD could be used.
On the other hand, when your vector contains a more complicated object with ctor, dtor and copy constructor, then it is not likely that SIMD could be used.
Please specify, in detail, what are the entries in your vector.
Jim Dempsey

I think that it is POD:

struct BookData
{
	typedef TEntry Entry;
	double price;
	ulong orderID;
	int quantity;
	int detail;

	BookData() = default;
	BookData(const BookData &) = default;
	BookData & operator =(const BookData &) = default;

	BookData(double price, ulong orderID, int quantity, int detail) :
		price(price),
		orderID(orderID),
		quantity(quantity),
		detail(detail) { }

	BookData(const Entry & entry) :
		price(entry.entry.price),
		orderID(entry.entry.orderID),
		quantity(entry.entry.quantity),
		detail(entry.entry.detail) { }

	BookData & operator = (const Entry & entry)
	{
		price = entry.entry.price;
		orderID = entry.entry.orderID;
		quantity = entry.entry.quantity;
		detail = entry.entry.detail;
		return *this;
	}
};

I am compiling the program using:

icc -xskylake-avx512 -Ofast a.cpp -lpthread -ldl -o stdvector_icc

and with gcc:

g++ -march=skylake-avx512 -Ofast a.cpp -lpthread -ldl -o stdvector_g++

The program compiled with g++ is faster than the program compiled with icc. But it is different than the various internet benchmarks show. Maybe I did something wrong ?

jimdempseyatthecove · ‎07-19-2019

You could roll your own. IOW encapsulate std::vector, for example call it fastvector which performs the insert and erase using the immintrin.h intrinsics.

insert would have to determine if capacity is insufficient
    if insufficient
         allocate new buffer with sufficient size
         copy old POD up to insertion point using intrinsics (have to handle alignment issues in front and back)
         copy insertion data
         copy remaining old data
         fixup vector header
         delete old buffer
   else
         copy old POD from last entry in reverse order to insertion point to a position of (former) last entry + # to be inserted using intrinsics (have to handle alignment issues in front and back)
         copy insertion data to insertion point using intrinsics (have to handle alignment issues in front and back)
   endif

Caution, do not use sizeof(BookData) to determine the size of the entry into the vector. Instead use (&data[1] - &data[0]). While these numbers may be the same, this will protect against the compiler inserting a/some pad byte(s) between entries to meet alignment requirements.

You can figure out the erase based on what you have to do with insert.

Jim Dempsey

C++: SIMD and std::vector