- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I'd like to take advantage of IPP library and vectorization, applying a clamp/clip to an array of double (or better, to a sum of each item in two arrays of double).
Actual code is somethings like this:
double *pStart = mStartVoicesValues[voiceIndex]; double *pMod = pModValues + voiceIndex * bufferSize; double *pValue = mProcessedValues[voiceIndex]; for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) { pValue[sampleIndex] = std::clamp(pStart[sampleIndex] + pMod[sampleIndex], 0.0, 1.0); }
Called often, with variable blockSize. It sum each other pStart and pMod, and clip if they are below/over 0.0/1.0 (for each value on each array). Clearly, the three arrays are equals in sizes. Note: I could also copy the result on pStart, without using the third array pValue.
Which IPP do you suggest?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IPP doesn't offer exactly such function but using ippsAddProduct_64f(...) and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Intel) wrote:IPP doesn't offer exactly such function but using ippsAddProduct_64f(...) and then try to take the the threshold computation by calling ippsThreshold_64fc_I(). All of these may help to see some speedup.
Thanks for the reply :) I believe you meant Add, not AddProduct.
So the result will be:
ippsAdd_64f(pStart, pMod, pValue, blockSize); ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess); ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
It seems Threshold a bit redundant, isn't?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
markzzz wrote:It seems Threshold a bit redundant, isn't?
Yes, there is special function ippsThreshold_LTValGTVal_64f_I in ipps.h.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Funny :)
In the end, this:
ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize); ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess); ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
seems faster than this:
ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize); ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0);
For the test I did, ~800ms instead of ~500.
Normal?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it is surprise for me too. Looks like because ippsThreshold_LTValGTVal_64f_I has additional parameters for replacement and therefore does additional work that affect performance. Could you please measure with big enough volume of data > L3 cache?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey Bakshaev (Intel) wrote:
Could you please measure with big enough volume of data > L3 cache?
Why this would make things different? I mean, L3 is the last "chain" of the cart. I hope the cache (if any) occurs in L1 or L2. Anyway, that's the whole code of testing:
#include <iostream> #include <chrono> #include <algorithm> #include "ipp.h" constexpr int voiceSize = 16; constexpr int bufferSize = 256; class Param { public: double mMin, mRange; Ipp64f mSmoothedValues[bufferSize]; Ipp64f *pModulationVoicesValues; Ipp64f mProcessedVoicesValues[voiceSize][bufferSize]; Param(double min, double max) : mMin { min }, mRange{ max - min } { } inline void AddModulation(int voiceIndex, int blockSize) { Ipp64f *pMod = pModulationVoicesValues + voiceIndex * bufferSize; Ipp64f *pValue = mProcessedVoicesValues[voiceIndex]; // add modulation ippsAdd_64f(mSmoothedValues, pMod, pValue, blockSize); //ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess); //ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater); ippsThreshold_LTValGTVal_64f_I(pValue, blockSize, 0.0, 0.0, 1.0, 1.0); } }; class MyPlugin { public: Ipp64f gainModValues[voiceSize][bufferSize]; Ipp64f offsetModValues[voiceSize][bufferSize]; Ipp64f pitchModValues[voiceSize][bufferSize]; Param mGain{ 0.0, 1.0 }; Param mOffset{ -900.0, 900.0 }; Param mPitch{ -48.0, 48.0 }; MyPlugin() { // link mod arrays to params mGain.pModulationVoicesValues = gainModValues[0]; mOffset.pModulationVoicesValues = offsetModValues[0]; mPitch.pModulationVoicesValues = pitchModValues[0]; // fancy data for smooth at audio rate for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) { mGain.mSmoothedValues[sampleIndex] = 0.5; mOffset.mSmoothedValues[sampleIndex] = 0.5; mPitch.mSmoothedValues[sampleIndex] = 0.5; } // fancy data for mod at audio rate for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) { for (int sampleIndex = 0; sampleIndex < bufferSize; sampleIndex++) { double value = (sampleIndex / ((double)bufferSize - 1)) * 2.0 - 1.0; gainModValues[voiceIndex][sampleIndex] = value; offsetModValues[voiceIndex][sampleIndex] = value; pitchModValues[voiceIndex][sampleIndex] = value; } } } ~MyPlugin() { } void Process(int blockSize) { // voices for (int voiceIndex = 0; voiceIndex < voiceSize; voiceIndex++) { // add modulation mGain.AddModulation(voiceIndex, blockSize); mOffset.AddModulation(voiceIndex, blockSize); mPitch.AddModulation(voiceIndex, blockSize); } } }; int main() { std::chrono::high_resolution_clock::time_point pStart; std::chrono::high_resolution_clock::time_point pEnd; MyPlugin myPlugin; // audio host call long long numProcessing = 1024 * 50; long long counterProcessing = 0; pStart = std::chrono::high_resolution_clock::now(); while (counterProcessing++ < numProcessing) { // variable blockSize (i.e. it can vary) int blockSize = 256; // process data myPlugin.Process(blockSize); } pEnd = std::chrono::high_resolution_clock::now(); std::cout << "execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(pEnd - pStart).count() << " ms" << std::endl; }
(comment/uncomment the twice ippsThreshold_64f_I or single ippsThreshold_LTValGTVal_64f_I to test the differences).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page