- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
I'm using IPP dynamic linked with clang 11.0.0.
Hardware:
- Processor Name: 6-Core Intel Core i5
- Processor Speed: 3 GHz
- Number of Processors: 1
- Total Number of Cores: 6
- L2 Cache (per Core): 256 KB
- L3 Cache: 9 MB
I might be missing something but should this be slower than with simple std:: functions?
If yes, how can I make it faster?
struct Vect3DArray { Ipp64f* x_; Ipp64f* y_; Ipp64f* z_; Vect3DArray(int size) { x_ = ippsMalloc_64f(size * sizeof(Ipp64f)); y_ = ippsMalloc_64f(size * sizeof(Ipp64f)); z_ = ippsMalloc_64f(size * sizeof(Ipp64f)); } ~Vect3DArray() { ippFree(x_); ippFree(y_); ippFree(z_); } }; int main() { Vect3DArray vectArray(kAmount); Vect3DArray dstVectArray(kAmount); Ipp64f* sums = ippsMalloc_64f(kAmount * sizeof(Ipp64f)); for (std::size_t i = 1; i < kAmount; ++i) { vectArray.x_ = i * 2.5; vectArray.y_ = i * 3.3; vectArray.z_ = i * 4.7; } auto start = std::chrono::high_resolution_clock::now(); ippsMul_64f(vectArray.x_, vectArray.x_, dstVectArray.x_, static_cast<int>(kAmount)); ippsMul_64f(vectArray.y_, vectArray.y_, dstVectArray.y_, static_cast<int>(kAmount)); ippsMul_64f(vectArray.z_, vectArray.z_, dstVectArray.z_, static_cast<int>(kAmount)); ippsAdd_64f(dstVectArray.x_, dstVectArray.y_, sums, kAmount); ippsAdd_64f(sums, vectArray.z_, sums, kAmount); ippsSqr_64f_I(sums, kAmount); ippsDiv_64f_I(sums, vectArray.x_, kAmount); ippsDiv_64f_I(sums, vectArray.y_, kAmount); ippsDiv_64f_I(sums, vectArray.z_, kAmount); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); std::cout << "#" << duration << std::endl; }
- Tags:
- Development Tools
- General Support
- Intel® Integrated Performance Primitives
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the value of kAmount ? Do the vectors fit in L1 cache ? If not, try to do the various operations on chunks that do fit in L1 cache rather than on the whole vector.
Regards,
Adriaan van Os
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for the reply.
The vectors don't fit L1, as kAmount was 12000 in my tests.
What kind of improvement can I look to obtain in the best case scenario?
Alexandre F.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, that depends on a lot of factors, most important cache usage. And it could be that the system (CPU) is doing some background work, etcetera.
Note that the first call to IPP may be "very slow" (like 1 millisecond) due to library initialization. So, keep that call out of the timing.
Based on limited tests I did, the speed improvement with Float32 is typically 3x (that number is probably better on a CPU with bigger vector registers, like AVX-512). With Float64 the speed improvement is disappointing (typically up to 50% or at most 100%). I my limited tests, some ipps Float64 functions were slower than their vDSP https://developer.apple.com/documentation/accelerate/vdsp?language=objc counterparts. Again, that may be better on a CPU with bigger vector registers, like AVX-512.
In my opinion, with Float64, it pays more to make your code threaded (I mean explicit with pthreads, not semi-automatic with OpenMP). But then it depends how stupid (sorry) the thread synchronisation is. Use "lock-free" synchronisation, never critical sections, they spoil everything.
Regards,
Adriaan van Os
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also note that Clang has two built-in vectorizers https://www.llvm.org/docs/Vectorizers.html, that for comparison you can put on and off.
Regards,
Adriaan van Os

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page