- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alright, after a lot of effort I have rewritten YetiSim from scratch with new components, and a lot of template programming. Here are the timings from execution with gprof (I have yet to get VTune to work without crashing). The SimExecutionObject::execute that you see taking 27% of time.... that's the beef, so the more processing time the better there. BTW, this is on a dual-quad core Xeon... working on getting the code onto an Itanium2-128 processor monster, but having compilation issues.
Question is... the measurement for tbb::internal::start_for.... what exactly is that? I mean, I know what it does in the code... but does this mean that the overhead of running parallelization takes 18% of execution time?
I will be trying "Thread Checker" soon... it keeps giving me a floating point exception when I run it... so I'll have to play with it later today.
Any suggestions on general analysis of parallel code for performance would be appreciated. This was compiled with Intel C++ compiler 10.1, with -O2 and -lirc -ltbb -ltbbmalloc flags.
AJ
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
27.61 18.28 18.28 3624777 0.01 0.01 SimExecutionObject
18.44 30.49 12.21 tbb::internal::start_for<:CONCURRENT_VECTOR>
15.92 41.03 10.54 30523145 0.00 0.00 boost::iterator_facade<:TRANSFORM_ITERATOR><:INTERNAL_EDGE_TO_VERTEX_EDGE_PAIR><:INTERNAL_EDGE>
15.57 51.34 10.31 3435987 0.00 0.00 Clock::tick()
7.37 56.22 4.88 3325636 0.00 0.00 lessThanMinute(Clock&)
4.44 59.16 2.94 20885637 0.00 0.00 boost::function1
1.60 60.22 1.06 tbb::task_scheduler_init::~task_scheduler_init()
1.28 61.07 0.85 3355148 0.00 0.00 boost::function1
1.27 61.91 0.84 tcc::ptr_vector
1.13 62.66 0.75 3485931 0.00 0.00 SimLink
1.08 63.37 0.72 3577582 0.00 0.00 SimLink
1.08 64.09 0.72 3451658 0.00 0.00 boost::detail::function::void_function_obj_invoker1<:_BI::BIND_T>
0.89 64.68 0.59 3314952 0.00 0.00 boost::detail::function::function_invoker1
0.86 65.25 0.57 7097027 0.00 0.00 SimNode
0.20 65.38 0.13 main
0.17 65.49 0.11 1000000 0.00 0.00 tcc::execution_graph
0.16 65.59 0.11 __gnu_cxx::new_allocator<:_BI::BIND_T>
0.15 65.69 0.10 1000000 0.00 0.00 _ZN14GraphExecutionI5ClockN3tcc15internal_vertexI7SimNodeI7SimLinkIS0_EENS1_18ptr_vector_defaultEEEEC9ERS0_RS8_
0.14 65.78 0.09 1000000 0.00 0.00 GraphExecution
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
aj.guillon@gmail.com:
Question is... the measurement for tbb::internal::start_for.... what exactly is that? I mean, I know what it does in the code... but does this mean that the overhead of running parallelization takes 18% of execution time?
It can happen that the compiler inlined the operator() of the body object into the execute() method of the parallel_for task.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page