- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i have a question on how to optimize some code i am working on. The heavy part of the code is a function to compute some quantities, no loops, just lots of float point computations (over 2000 lines). This function will be called millions of times for different inputs, so it is very critical to the overall performance. Also the computation has pretty a lot of if-else checkings.
I tried to run my program in vtune, and clock ticks per instruction i got was 4.5. According to vtune's manual it seems to be a very bad number. So i wonder if there is any tips on how to improve it?
One idea i have now is to break long expressions in a bunch of short ones, for example,
a=b*c*e*f;
becomes,
a1=b*c;
a2=e*f;
a=a1*a2;
so cpu is more likely to do more instructions in clock cycle? But is this true? I will really appreciate any advice?
Best,
Ben
i have a question on how to optimize some code i am working on. The heavy part of the code is a function to compute some quantities, no loops, just lots of float point computations (over 2000 lines). This function will be called millions of times for different inputs, so it is very critical to the overall performance. Also the computation has pretty a lot of if-else checkings.
I tried to run my program in vtune, and clock ticks per instruction i got was 4.5. According to vtune's manual it seems to be a very bad number. So i wonder if there is any tips on how to improve it?
One idea i have now is to break long expressions in a bunch of short ones, for example,
a=b*c*e*f;
becomes,
a1=b*c;
a2=e*f;
a=a1*a2;
so cpu is more likely to do more instructions in clock cycle? But is this true? I will really appreciate any advice?
Best,
Ben
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BEN,
Be sure to look at the intel Math Kernel Libraries, which are highly optimized, thread-safe math routines for High-Performance Computing (HPC) science, engineering, and financial applications that require maximum performance on Intel processors.
If you go to our URL you can purchase them or kick the tires on an eval copy: sounds like you might find a definite use for them based on your posting!
cheers
jdg
Message Edited by jdgallag on 03-28-2006 04:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to what you have said, you should be using VTune to look for mispredicted branches, as well as finding the sections of code responsible for high ticks per instruction. Of course, reading the docs should have told you that, and given you hints about how to narrow down your search for problems. By not allowing looping, you severely limit your options for improving performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jdg,
thanks a lot for the reply. Yes, i did take a look at the intel math kernel lib. But what i found is those function were exclusively designed for vector operations. But for my case, all the computations are scalar-based.:-( Is my understanding correct?
Best,
Ben
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page