- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This is the first time I use vTune, to tune a quite complex bit of C-code.
All it does is basically "Calculate x and add it to an unsined char, and clip it to 255", for a lot of pixels.
Because of the complex nature of the code, its hardly possible to optimize it :-/
vTune tells me a lot of time is used for modifying the data itself:
READ/WRITE are bacially pointer-access wrapper macros, clip255 is simple clipping method.
Any ideas why so many cycles are spent here?
Furthermore, is this really the assembler generated for the C code, or does vTune mix things up?
I am only able to read assembler a bit, but clip255 should generate at least some kind of conditional operation like cmov or a compare+jump, but I don't see something like this in the code.
Thank you in advance, Clemens
This is the first time I use vTune, to tune a quite complex bit of C-code.
All it does is basically "Calculate x and add it to an unsined char, and clip it to 255", for a lot of pixels.
Because of the complex nature of the code, its hardly possible to optimize it :-/
vTune tells me a lot of time is used for modifying the data itself:
READ/WRITE are bacially pointer-access wrapper macros, clip255 is simple clipping method.
Any ideas why so many cycles are spent here?
Furthermore, is this really the assembler generated for the C code, or does vTune mix things up?
I am only able to read assembler a bit, but clip255 should generate at least some kind of conditional operation like cmov or a compare+jump, but I don't see something like this in the code.
Thank you in advance, Clemens
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Clemens,
First, I doubt if it was caused by compiler optimization options, but it seems that your assembly code makes sense.
Secondary, was it caused by your function calls (WRITE & READ) which are implementedby Macro? So Clockticks distributed on "addl -60(%ebp), %edx" is incorrect.
Canthis problem be repeated on other function call statement?
Isuggest you to verify on Macro issue, or submit a new issue to https://premier.intel.com- if you can provide test case to us.
Thanks, Peter
First, I doubt if it was caused by compiler optimization options, but it seems that your assembly code makes sense.
Secondary, was it caused by your function calls (WRITE & READ) which are implementedby Macro? So Clockticks distributed on "addl -60(%ebp), %edx" is incorrect.
Canthis problem be repeated on other function call statement?
Isuggest you to verify on Macro issue, or submit a new issue to https://premier.intel.com- if you can provide test case to us.
Thanks, Peter
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page