Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® Integrated Performance Primitives
- AffineTransform Variant

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Hi,

kramulous

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2011
05:41 AM

8 Views

AffineTransform Variant

I've been speeding up some rendering code of mine and have reduced the problem to a function that takes 98% of the time.

[cpp] ippsMulC_32f(dataset.xpoints, a, temp1, n); ippsAddProductC_32f(dataset.ypoints, b, temp1, n); ippsAddProductC_32f(dataset.zpoints, c, temp1, n); ippsAddC_32f_I(d, temp1, n); ippsMulC_32f(dataset.xpoints, e, temp2, n); ippsAddProductC_32f(dataset.ypoints, f, temp2, n); ippsAddProductC_32f(dataset.zpoints, g, temp2, n); ippsAddC_32f_I(h, temp2, n); ippsMulC_32f(dataset.xpoints, i, temp3, n); ippsAddProductC_32f(dataset.ypoints, j, temp3, n); ippsAddProductC_32f(dataset.zpoints, k, temp3, n); ippsAddC_32f_I(l, temp3, n); ippsDivCRev_32f_I(1.0, temp3, n); for (int p = 0; p= ca + ca*temp1 *vd*temp3

; ytrans

= cb - cb*temp2

*vda*temp3

; }[/cpp]

Eventually I have to get rid of the temporary variables (temp1,temp2, temp3) ... memory requirements will not allow them for too much further.

Now, I see the closest contender for what I need is the "ippmAffineTransform3DH_mva_32f",but it does:

X' = t_{11}*X + t_{12}*Y + t_{13}*Z + t_{14}

Y' = t_{21}*X + t_{22}*Y + t_{23}*Z + t_{24}

Z' = t_{31}*X + t_{32}*Y + t_{33}*Z + t_{34}

W = t_{41}*X + t_{42}*Y + t_{43}*Z + t_{44}

X' = X'/W

Y' = Y'/W

Z' = Z'/W

But I need

X' = t_{11}*X + t_{12}*Y + t_{13}*Z + t_{14}

Y' = t_{21}*X + t_{22}*Y + t_{23}*Z + t_{24}

W = t_{41}*X + t_{42}*Y + t_{43}*Z + t_{44},

and then the following vector transformation:

X' = X'/W

Y' = Y'/W

So I guess my question is this, are there any internal optimisations that will flagt_{31,}t_{32,}t_{33}and t_{34}

as zero and not include them in the calculation?

It would be really, really nice to sort this last bit out so I can get balanced parallel (multi socket/core) and full vectorisation.

Currently getting 2 billion points rendering at 20 frames per second at 9600x4800 resolution (20x1920x1200). The interaction is just a fraction too slow for fluid manipulation. By the end of the year the resolution will expected to increase 6x more so milking every bit of performance is essential.

Pointers from the experts?

1 Reply

Highlighted
##

kramulous

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-05-2011
02:05 PM

8 Views

Nevermind ... found a way to get force full vectorisation anyway. Works well.

For more complete information about compiler optimizations, see our Optimization Notice.