Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

How to clamp/clip some data?

markzzz
Beginner
669 Views

Hi there,

I'd like to take advantage of IPP library and vectorization, applying a clamp/clip to an array of double (or better, to a sum of each item in two arrays of double).

Actual code is somethings like this:

double *pStart = mStartVoicesValues[voiceIndex];	
double *pMod = pModValues + voiceIndex * bufferSize;
double *pValue = mProcessedValues[voiceIndex];	

for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
	pValue[sampleIndex] = std::clamp(pStart[sampleIndex] + pMod[sampleIndex], 0.0, 1.0);
}

Called often, with variable blockSize. It sum each other pStart and pMod, and clip if they are below/over 0.0/1.0 (for each value on each array). Clearly, the three arrays are equals in sizes. Note: I could also copy the result on pStart, without using the third array pValue.

Which IPP do you suggest?

Thanks

0 Kudos
6 Replies
Gennady_F_Intel
Moderator
669 Views

IPP doesn't offer exactly such function but using ippsAddProduct_64f(...)   and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.

0 Kudos
markzzz
Beginner
669 Views

Gennady F. (Intel) wrote:

IPP doesn't offer exactly such function but using ippsAddProduct_64f(...)   and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.

Thanks for the reply :) I believe you meant Add, not AddProduct.

So the result will be:

ippsAdd_64f(pStart, pMod, pValue, blockSize);
ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);

It seems Threshold a bit redundant, isn't?

0 Kudos
Andrey_B_Intel
Employee
669 Views

markzzz wrote:

It seems Threshold a bit redundant, isn't?

Yes, there is special function ippsThreshold_LTValGTVal_64f_I in ipps.h.

Thanks.

0 Kudos
markzzz
Beginner
669 Views

Funny :)

In the end, this:

ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize);
ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);

seems faster than this:

ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize);
ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0);	

For the test I did, ~800ms instead of ~500.

Normal?

0 Kudos
Andrey_B_Intel
Employee
669 Views

Yes, it is  surprise for me too. Looks like because  ippsThreshold_LTValGTVal_64f_I has additional parameters for replacement and therefore does additional work that affect performance. Could you please measure with big enough volume of data > L3 cache? 

0 Kudos
markzzz
Beginner
669 Views

Andrey Bakshaev (Intel) wrote:
Could you please measure with big enough volume of data > L3 cache? 

Why this would make things different? I mean, L3 is the last "chain" of the cart. I hope the cache (if any) occurs in L1 or L2. Anyway, that's the whole code of testing:

#include <iostream>
#include <chrono>
#include <algorithm>
#include "ipp.h"

constexpr int voiceSize = 16;
constexpr int bufferSize = 256;

class Param
{
public:
	double mMin, mRange;

	Ipp64f mSmoothedValues[bufferSize];
	Ipp64f *pModulationVoicesValues;
	Ipp64f mProcessedVoicesValues[voiceSize][bufferSize];

	Param(double min, double max) : mMin { min }, mRange{ max - min } { }

	inline void AddModulation(int voiceIndex, int blockSize) {
		Ipp64f *pMod = pModulationVoicesValues + voiceIndex * bufferSize;
		Ipp64f *pValue = mProcessedVoicesValues[voiceIndex];

		// add modulation
		ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize);
		//ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
		//ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
		ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0);
	}
};

class MyPlugin
{
public:
	Ipp64f gainModValues[voiceSize][bufferSize];
	Ipp64f offsetModValues[voiceSize][bufferSize];
	Ipp64f pitchModValues[voiceSize][bufferSize];
	
	Param mGain{ 0.0, 1.0 };
	Param mOffset{ -900.0, 900.0 };
	Param mPitch{ -48.0, 48.0 };

	MyPlugin() {
		// link mod arrays to params
		mGain.pModulationVoicesValues = gainModValues[0];
		mOffset.pModulationVoicesValues = offsetModValues[0];
		mPitch.pModulationVoicesValues = pitchModValues[0];

		// fancy data for smooth at audio rate
		for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) {
			mGain.mSmoothedValues[sampleIndex] = 0.5;
			mOffset.mSmoothedValues[sampleIndex] = 0.5;
			mPitch.mSmoothedValues[sampleIndex] = 0.5;
		}

		// fancy data for mod at audio rate
		for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) {
			for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) {
				double value = (sampleIndex / ((double)bufferSize - 1)) * 2.0 - 1.0;

				gainModValues[voiceIndex][sampleIndex] = value;
				offsetModValues[voiceIndex][sampleIndex] = value;
				pitchModValues[voiceIndex][sampleIndex] = value;
			}
		}
	}
	~MyPlugin() { }

	void Process(int blockSize) {
		// voices
		for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) {
			// add modulation
			mGain.AddModulation(voiceIndex, blockSize);
			mOffset.AddModulation(voiceIndex, blockSize);
			mPitch.AddModulation(voiceIndex, blockSize);
		}
	}
};

int main() {
	std::chrono::high_resolution_clock::time_point pStart;
	std::chrono::high_resolution_clock::time_point pEnd;
	MyPlugin myPlugin;

	// audio host call
	long long numProcessing = 1024 * 50;
	long long counterProcessing = 0;
	pStart = std::chrono::high_resolution_clock::now();
	while (counterProcessing++ < numProcessing) {
		// variable blockSize (i.e. it can vary)
		int blockSize = 256;

		// process data
		myPlugin.Process(blockSize);
	}
	pEnd = std::chrono::high_resolution_clock::now();
	std::cout << "execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(pEnd - pStart).count() << " ms" << std::endl;
}

(comment/uncomment the twice ippsThreshold_64f_I or single ippsThreshold_LTValGTVal_64f_I to test the differences).

0 Kudos
Reply