- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I'd like to take advantage of IPP library and vectorization, applying a clamp/clip to an array of double (or better, to a sum of each item in two arrays of double).
Actual code is somethings like this:
double *pStart = mStartVoicesValues[voiceIndex];
double *pMod = pModValues + voiceIndex * bufferSize;
double *pValue = mProcessedValues[voiceIndex];
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
pValue[sampleIndex] = std::clamp(pStart[sampleIndex] + pMod[sampleIndex], 0.0, 1.0);
}
Called often, with variable blockSize. It sum each other pStart and pMod, and clip if they are below/over 0.0/1.0 (for each value on each array). Clearly, the three arrays are equals in sizes. Note: I could also copy the result on pStart, without using the third array pValue.
Which IPP do you suggest?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IPP doesn't offer exactly such function but using ippsAddProduct_64f(...) and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Intel) wrote:IPP doesn't offer exactly such function but using ippsAddProduct_64f(...) and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.
Thanks for the reply :) I believe you meant Add, not AddProduct.
So the result will be:
ippsAdd_64f(pStart, pMod, pValue, blockSize); ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess); ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
It seems Threshold a bit redundant, isn't?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
markzzz wrote:It seems Threshold a bit redundant, isn't?
Yes, there is special function ippsThreshold_LTValGTVal_64f_I in ipps.h.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Funny :)
In the end, this:
ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize); ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess); ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
seems faster than this:
ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize); ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0);
For the test I did, ~800ms instead of ~500.
Normal?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it is surprise for me too. Looks like because ippsThreshold_LTValGTVal_64f_I has additional parameters for replacement and therefore does additional work that affect performance. Could you please measure with big enough volume of data > L3 cache?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey Bakshaev (Intel) wrote:
Could you please measure with big enough volume of data > L3 cache?
Why this would make things different? I mean, L3 is the last "chain" of the cart. I hope the cache (if any) occurs in L1 or L2. Anyway, that's the whole code of testing:
#include <iostream>
#include <chrono>
#include <algorithm>
#include "ipp.h"
constexpr int voiceSize = 16;
constexpr int bufferSize = 256;
class Param
{
public:
double mMin, mRange;
Ipp64f mSmoothedValues[bufferSize];
Ipp64f *pModulationVoicesValues;
Ipp64f mProcessedVoicesValues[voiceSize][bufferSize];
Param(double min, double max) : mMin { min }, mRange{ max - min } { }
inline void AddModulation(int voiceIndex, int blockSize) {
Ipp64f *pMod = pModulationVoicesValues + voiceIndex * bufferSize;
Ipp64f *pValue = mProcessedVoicesValues[voiceIndex];
// add modulation
ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize);
//ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
//ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0);
}
};
class MyPlugin
{
public:
Ipp64f gainModValues[voiceSize][bufferSize];
Ipp64f offsetModValues[voiceSize][bufferSize];
Ipp64f pitchModValues[voiceSize][bufferSize];
Param mGain{ 0.0, 1.0 };
Param mOffset{ -900.0, 900.0 };
Param mPitch{ -48.0, 48.0 };
MyPlugin() {
// link mod arrays to params
mGain.pModulationVoicesValues = gainModValues[0];
mOffset.pModulationVoicesValues = offsetModValues[0];
mPitch.pModulationVoicesValues = pitchModValues[0];
// fancy data for smooth at audio rate
for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) {
mGain.mSmoothedValues[sampleIndex] = 0.5;
mOffset.mSmoothedValues[sampleIndex] = 0.5;
mPitch.mSmoothedValues[sampleIndex] = 0.5;
}
// fancy data for mod at audio rate
for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) {
for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) {
double value = (sampleIndex / ((double)bufferSize - 1)) * 2.0 - 1.0;
gainModValues[voiceIndex][sampleIndex] = value;
offsetModValues[voiceIndex][sampleIndex] = value;
pitchModValues[voiceIndex][sampleIndex] = value;
}
}
}
~MyPlugin() { }
void Process(int blockSize) {
// voices
for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) {
// add modulation
mGain.AddModulation(voiceIndex, blockSize);
mOffset.AddModulation(voiceIndex, blockSize);
mPitch.AddModulation(voiceIndex, blockSize);
}
}
};
int main() {
std::chrono::high_resolution_clock::time_point pStart;
std::chrono::high_resolution_clock::time_point pEnd;
MyPlugin myPlugin;
// audio host call
long long numProcessing = 1024 * 50;
long long counterProcessing = 0;
pStart = std::chrono::high_resolution_clock::now();
while (counterProcessing++ < numProcessing) {
// variable blockSize (i.e. it can vary)
int blockSize = 256;
// process data
myPlugin.Process(blockSize);
}
pEnd = std::chrono::high_resolution_clock::now();
std::cout << "execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(pEnd - pStart).count() << " ms" << std::endl;
}
(comment/uncomment the twice ippsThreshold_64f_I or single ippsThreshold_LTValGTVal_64f_I to test the differences).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page