- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I want to use IPPS to do vector arithmetic. I found the following function:

IppStatus ippsMul_16s_Sfs(const Ipp16s* pSrc1, const Ipp16s* pSrc2, Ipp16s* pDst, int len, int scaleFactor);

However, what if the two numbers I am multiplying require 32-bits for the product? I want something like:

IppStatus ippsMul_16s32s(const Ipp16s* pSrc1, const Ipp16s* pSrc2, Ipp32s* pDst, int len);

Do these function exist? Am I just not looking in the correct place?

Thanks!

Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

IppStatus ippsMul_16s32s_Sfs(const Ipp16s* pSrc1, const Ipp16s* pSrc2, Ipp32s* pDst, int len, int scaleFactor);

That should work for multiplying two vectors. But what about the MulC variant. Do I have to copy the 16bit vector into a 32bit vector and then just use the 32s variant?

Also, I kinda have a general question. How long does it take to copy vectors? How about inplace functions versus not-in-place functions. Do these have very high performace differences?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

But I meant to comment on the functions I found:

When there is a scalefactor on the end, does this affect the performance much? I really don't need it, but there isn't a version without the scalefactor. If I set it to 1, does this really do the same thing as not having a scale factor?

Also, there is a sub function that looks like this:

IppStatus ippsSub_16s32f(const Ipp16s* pSrc1, const Ipp16s* pSrc2, Ipp32f* pDst, int len);

I don't really want a floating point representation because I am just working with integers. Why isn't there a 32s version?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you need to process data on higher bitdepth it often will be faster to do it once at the beginning of the algorithm, then process the 32bit data and at last convert back to 16bit.

Every function which has different input-/output bit depths needs to convert each value (in case of SSE2 8 values) into this higher bitdepth before doing the operation. This will increase the latency very much and therefore reduces the performance.

Of course there are some functions with different in-/output bitdepths missing, but I think even Intel needs time to react on customer wishes and implement them. So maybe they will be there sometimes :)

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page