Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1135 Discussions

Optimization of sine function's taylor expansion

Bernard
Valued Contributor I
37,408 Views

Hello!
While coding in assembly various series expansions of many functions i tried to optimize my code mainly by using rcpps instruction instead of divaps.When performing tests oncode which calculates sine function by taylor series i did a few measurements as explained here :"http://software.intel.com/en-us/forums/showthread.php?t=52482" and the result was between 10-12 cycles per first term of sine expansion(i used 14 terms).
I would like to ask you how can i rewrite this code in order to gain speed of execution improvment.
[bash]movups xmm0,argument movups xmm1,argument mulps xmm1,xmm1 mulps xmm1,xmm0 mov ebx,OFFSET coef movups xmm2,[ebx] rcpps xmm3,xmm2 ;rcpps used instead of divaps mulps xmm1,xmm3 subps xmm0,xmm1[/bash]

0 Kudos
1 Solution
bronxzv
New Contributor II
37,217 Views

calls 1e6 times fastsin() the result in millisecond is 63

so it's 63 ns per iteration or ~ 120 clocks on your CPU, it does't match your previous reports IIRC

if you keep only the polynomial (get rid of the strange domain check) you should begin to see timings nearer than mine

View solution in original post

0 Kudos
342 Replies
bestofshayari
Beginner
2,035 Views

I have the following trig functions, but I am wondering if there is a faster algorithm that I could implement:

  1. static const double SINMIN=0.0009999998333333;
  2. static const double COSMIN=0.9999995000000417;
  3. static const double TANMIN=0.0010000003333334;
  4. static const double E=2.718281828459045235360;
  5. static const double PI=3.14159265358979323846;
  6. static const double MINVAL=0.01;
  7. double sin(double ax)
  8. {
  9. double x=aabs(ax);
  10. if (x==MINVAL)
  11. {
  12. switch (quadrant(ax))
  13. {
  14. case 1:
  15. case 2:
  16. return SINMIN;
  17. case 3:
  18. case 4:
  19. return -SINMIN;
  20. }
  21. }
  22. else
  23. return (SINMIN*cos(x-MINVAL)+COSMIN*sin(x-MINVAL));
  24. }
  25. double cos(double x)
  26. {
  27. if (x==MINVAL)
  28. {
  29. switch (quadrant(ax))
  30. {
  31. case 1:
  32. case 4:
  33. return COSMIN;
  34. case 2:
  35. case 3:
  36. return -COSMIN;
  37. }
  38. }
  39. else
  40. return (COSMIN*cos(x-MINVAL)+SINMIN*sin(x-MINVAL));
  41. }
  42. double tan(double x)
  43. {
  44. if (x==MINVAL)
  45. {
  46. switch (quadrant(ax))
  47. {
  48. case 1:
  49. case 3:
  50. return TANMIN;
  51. case 2:
  52. case 4:
  53. return -TANMIN;
  54. }
  55. }
  56. else
  57. return ((TANMIN+tan(x-MINVAL))/(1.0000000000000-(TANMIN*tan(x-MINVAL))));
  58. }
JAVASCRIPT CODE NEW
0 Kudos
Bernard
Valued Contributor I
2,035 Views

I have the following trig functions, but I am wondering if there is a faster algorithm that I could implement


Hi!
What mathematical formulas are your functions based on?
Did you use library trigonometric functions?
0 Kudos
SergeyKostrov
Valued Contributor II
2,035 Views
Hi Iliya,

Quoting iliyapolak
...What mathematical formulas are your functions based on?


Trigonometric functions implemented in the test-case( Post #290 )are based on fundamental trigonometric identities:

sin(A+B) = sin(A)*cos(B) + cos(A)*sin(B)

cos(A-B) = sin(A)*sin(B) + cos(A)*cos(B)

tan((A+B)/2) = ( sin(A) + sin(B) )/ ( cos(A) + cos(B) )

By the way, that methodis used in Digital Signal Processing to generate tables of values forsin, cos and tan ( all based on Linear Interpolation )
forembedded systems with very constrained Read Only Memory (ROM )resources.

I havemy owntest-case and I'll post it later...

Best regards,
Sergey

0 Kudos
Bernard
Valued Contributor I
2,035 Views

Trigonometric functions implemented in the test-case( Post #290 )are based on fundamental trigonometric identities


Yes that's true,but in his post he is looking for the the fastest method for trigoformulas calculation.
0 Kudos
SergeyKostrov
Valued Contributor II
2,035 Views
I have the following trig functions, but I am wondering if there is a faster algorithm that I could implement...


Please take a look at results of my tests in a Post #279:

http://software.intel.com/en-us/forums/showpost.php?p=190992

There are two groups of test results and results inevey setareordered fromthe best to the worst.

Best regards,
Sergey

0 Kudos
Bernard
Valued Contributor I
2,035 Views
@Sergey Does your summary project is still relevant?
0 Kudos
SergeyKostrov
Valued Contributor II
2,035 Views
Hi Iliya, . >>...Does your summary project is still relevant? . I just completed a quick review and the most important piece of information that I wanted to use is removed and this is "A Post Number". I really don't understand why developers of new IDZ website removed it. How are we going to refer to some post? Any ideas? . Best regards, Sergey
0 Kudos
Bernard
Valued Contributor I
2,035 Views
Hi Sergey >>...I just completed a quick review and the most important piece of information that I wanted to use is removed and this is "A Post Number". I really don't understand why developers of new IDZ website removed it. How are we going to refer to some post? Any ideas? Yes I know that.Developers removed 64 posts from this thread.I have never anticipated such a loss of our accumulated knowledge. Can we write some petition to the webmaster?
0 Kudos
SergeyKostrov
Valued Contributor II
2,035 Views
>>...Yes I know that.Developers removed 64 posts from this thread.I have never anticipated such a loss of our accumulated knowledge... . I didn't count have many posts are lost but a damage to some source code examples is significant because some of them are re-formatted to almost unreadable state. . >>...Can we write some petition to the webmaster? . I think Yest but is it going to work? I don't think so. He/She won't try to do anything to recover these lost posts. On my account I lost almost 500 posts and access to my private folders with screenshots, doc and zip archives. They don't want to change a color of font from light grey to black in the new editor. I can't read clearly and need ro move my head as closer as possible to my display. Is it difficult to change the color? No, however it is not done in 4 weeks!
0 Kudos
Bernard
Valued Contributor I
2,032 Views
>>... didn't count have many posts are lost but a damage to some source code examples is significant because some of them are re-formatted to almost unreadable state. Cannot understand the reason behind the forum redesign.Why they needed to do this?Everything was working ok, it was very strange decision not so easily understandable by us - forum users. >>>...I think Yest but is it going to work? I don't think so. He/She won't try to do anything to recover these lost posts. On my account I lost almost 500 posts and access to my private folders with screenshots, doc and zip archives. They don't want to change a color of font from light grey to black in the new editor. I can't read clearly and need ro move my head as closer as possible to my display. Is it difficult to change the color? No, however it is not done in 4 weeks! I feel sorry for you Sergey, for loss of your posts , code and docs.Do you have any backup of your code examples which were posted and have been lost?
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...Do you have any backup of your code examples which were posted and have been lost? . Yes, I have but I will need some time to identify, find and extract these test-cases. I have hundreds and hundreds of different tests in my own collection and it always a little challenge if something needs to be recovered. Unfortunately, i don't have time to re-post them.
0 Kudos
Bernard
Valued Contributor I
2,032 Views
It is good that you have a backup. @Sergey slightly off topic question to you.Now I'am working on multithreaded version of my SpecialFunctions library my methods will be mostly arrray - filling functions, but I think that simply using multithreaded java programming to fill an array is kinda boring.I would like to ask you what type of problems(presumably from math and/or physics) could be used to sharpen my multithreading programming skills :)
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...what type of problems(presumably from math and/or physics) could be used to sharpen my multithreading programming skills... . At the beginning I would consider a very simple case like addition of two large matrices. Depending on a number of CPUs the addition process could be done in parallel. Another example is sorting large data sets and a Merge sorting algorithm is a perfect candidate for R&D.
0 Kudos
Bernard
Valued Contributor I
2,032 Views
Hi Sergey! >>>...At the beginning I would consider a very simple case like addition of two large matrices>>> Thank you for your answer.I'am already testing double-loop arrays filled with sum of various sine function(fastsin beign used) with the varying frequency.My CPU is core i3 installed in laptop and from the tests i can see that for "large pool" of threads more than 4 I can gain slightly improved performance.I think that main reason behind the mediocre performance under heavily floating-point load is the hyper-threading technology.
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...Cannot understand the reason behind the forum redesign.Why they needed to do this? . Here is my understanding of the situation: . - 1. It was hard to maintain the Old System - 2. Intel wanted to make the upgrade before IDF 2012 event in San Francisco ( in order to announce about a new IDZ website ) - 3. As soon as some new versions of Intel software are released, like Intel C++ compiler v13 or IPP library, all posts must be placed in the New System . Best regards, Sergey
0 Kudos
Bernard
Valued Contributor I
2,032 Views
...>>>By the way, you're not the first developer who experienced a problem with HTT when a FPU is used. Could you provide some real performance numbers?....>>> Yes I expected such a behaviour from the HT processor.Simply provididng discreet logic as gp-registers with own APIC and sharing FPU beetwen physical and logical cores will not accelerate even easily "threadable" floating-point data. Soon I will post some results from my testing.But bear in mind that everything is written in java , and Java System.nano() timing method is binary translated to RDTSC instruction with all its consequences.
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...I think that main reason behind the mediocre performance under heavily floating-point load is the hyper-threading technology. . By the way, you're not the first developer who experienced a problem with HTT when a FPU is used. Could you provide some real performance numbers? . Best regards, Sergey . PS: Also, you could consider some simple algorithms like Matrix Transpose, finding a Min or Max values in a data set, scalar additions, etc. So, just use your imagination and almost all iterative algorithms ( with a 'FOR' statement ) could be implemented in parallel.
0 Kudos
Thomas_W_Intel
Employee
2,032 Views
iliyapolak wrote:

He/She won't try to do anything to recover these lost posts.

If you or Sergey are missing specific post or attachments, please raise the topic (e.g. here in the forum or by a pm to me). I cannot promise anything but, personally, I was missing an article and a blog and both could be recovered.
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...Soon I will post some results from my testing.But bear in mind that everything is written in java , and Java System.nano() timing >>method is binary translated to RDTSC instruction with all its consequences. . Thank you in advance, Iliya!
0 Kudos
SergeyKostrov
Valued Contributor II
2,032 Views
>>...If you or Sergey are missing specific post or attachments, please raise the topic (e.g. here in the forum or by a pm to me)... . Thomas, . Iliya and I noticed that lots ( hundreds! ) of posts are deleted. As I told in my case it is about 500. Also, I don't have any longer access to a 'Files' section of the Old System where I kept hundreds of jpg-files ( screenshots ), docs and zip archives. Unfortunately, my archive ( 10-month job on the ISN! ) was damaged significantly!
0 Kudos
Bernard
Valued Contributor I
2,032 Views
Hi Sergey! I have preliminary java multithreading tests results.I tested fastsin() method multiplied by java random.float() method to insert peudo-random noise into monotonic increasing data.The tested algorithm was simple bubble sort which sorted 1D array filled with 1000 and 1000000 elements. Access to the main sorting loop was controlled by synchronized statement , multiple threads were waiting for each other completion.Timingt method was java.nanoTime() which is binary translated to RDTSC instruction. Here are the results: 1a.Bubble sort 1000 elements three threads(main and 2 spawned threads) with synchronized{} statement : ~13.235 miliseconds. 1b.Bubble sort 1000 elements three threads(main and 2 spawned threads)without synchronized{} statement:~12.678 miliseconds. 2a.Bubble sort 1000 elements five threads(main and 4 spawned threads)without synchronized{} statement:~7.603 miliseconds. 2b.Bubble sort 1000 elements five threads(main and 4 spawned threads)with synchronized{} statement: ~10.551 miliseconds. 3a.Bubble sort 1000 elements nine threads(main and 8 spawned threads)with synchronized{}statement: ~10.233 miliseconds. 3b.Bubble sort 1000 elements nine threads(main and 8 spawned threads)without synchronized{}statement:~9.664 miliseconds. 4a.Bubble sort 1e4 elements five threads(main and 4 spawned threads)without synchronized{}statement: ~71.133 miliseconds. 4b.Bubble sort 1e4 elements five threads(main and 4 spawned threads)with synchronized{}statement:~173.134 miliseconds.
0 Kudos
Reply