- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Two questions:
1. I ran quantify on a very large program I have, and saw one section of memory allocation with "new", which caused heap contention. On an 8 core machine, only one cpu core was being utilized 100%, while the rest were idle. (saw that using the mpstat command)
I replaced that "new" and almost all other "new"'s with scalable_allocator (and replaced the corresponding deletes), but the result was the same. Only one core was being utilized 100%. I thought the load would get distributed on multiple cores.
Am using Boost threads. I understand scalable_allocator is supposed to create local heaps for each thread.
The question is: Am I wrong to assume that scalable_allocator would distribute cpu load? Is there a way to measure/know that scalable_allocator is improving performance (I'm assuming I'm using the wrong measuring tool)? Would VTune analyzer help?
2. I'm a bit queasy about object instances instantiated and allocated with scalable_allocator at runtime. Would it be correct if I do something like:
b = scalable_allocator ().allocate( 1 );
::new(b) K();
And the destruction is done like:
scalable_allocator ().destroy(b);
scalable_allocator ().deallocate(b, 1);
Instantiating with a child-class type and allocating with a parent-class type, and then destroying and deallocating with the parent class type? Is this the right way to do it?
(If you need to see the program, it's below. It seems work fine, but I just needed a confirmation)
#include
#include
#include
using namespace std;
using namespace tbb;
class Base
{
public:
Base() {cout<<"Base()"< virtual ~Base() {cout<<"Base ~"< };
class K: public Base
{
public:
K() {cout<<"K()"< ~K() {cout<<"K ~"< };
class V: public Base
{
public:
V() {cout<<"V()"< ~V() {cout<<"V ~"< };
class A: public Base
{
public:
Base* b;
bool someCondition;
A():someCondition(true)
{
if (!someCondition)
{
b = scalable_allocator ().allocate( 1 );
::new(b) K();
}
else
{
b = scalable_allocator ().allocate( 1 );
::new(b) V();
}
}
~A()
{
scalable_allocator ().destroy(b);
scalable_allocator ().deallocate(b, 1);
}
};
int main()
{
A* a = new A;
a->~A();
}
Outputs:
Base()
Base()
V()
V ~
Base ~
Base ~
1. I ran quantify on a very large program I have, and saw one section of memory allocation with "new", which caused heap contention. On an 8 core machine, only one cpu core was being utilized 100%, while the rest were idle. (saw that using the mpstat command)
I replaced that "new" and almost all other "new"'s with scalable_allocator (and replaced the corresponding deletes), but the result was the same. Only one core was being utilized 100%. I thought the load would get distributed on multiple cores.
Am using Boost threads. I understand scalable_allocator is supposed to create local heaps for each thread.
The question is: Am I wrong to assume that scalable_allocator would distribute cpu load? Is there a way to measure/know that scalable_allocator is improving performance (I'm assuming I'm using the wrong measuring tool)? Would VTune analyzer help?
2. I'm a bit queasy about object instances instantiated and allocated with scalable_allocator at runtime. Would it be correct if I do something like:
b = scalable_allocator
::new(b) K();
And the destruction is done like:
scalable_allocator
scalable_allocator
Instantiating with a child-class type and allocating with a parent-class type, and then destroying and deallocating with the parent class type? Is this the right way to do it?
(If you need to see the program, it's below. It seems work fine, but I just needed a confirmation)
#include
#include
#include
using namespace std;
using namespace tbb;
class Base
{
public:
Base() {cout<<"Base()"<
class K: public Base
{
public:
K() {cout<<"K()"<
class V: public Base
{
public:
V() {cout<<"V()"<
class A: public Base
{
public:
Base* b;
bool someCondition;
A():someCondition(true)
{
if (!someCondition)
{
b = scalable_allocator
::new(b) K();
}
else
{
b = scalable_allocator
::new(b) V();
}
}
~A()
{
scalable_allocator
scalable_allocator
}
};
int main()
{
A* a = new A;
}
Outputs:
Base()
Base()
V()
V ~
Base ~
Base ~
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. If only one thread does the allocating, the scalable allocator is not going to redistribute that load to other threads.
2. Don't be queasy.
2. Don't be queasy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Let's say there's one piece of code which uses "new", that multiple
threads can access. If I replace this "new" with scalable_allocator,
shouldn't it scale? This is the kind of situation I have, and I'm
assuming that the memory will now be allocated in the individual heap of
every thread. Is that right?
2. Thanks :) I presume I was doing it right.
2. Thanks :) I presume I was doing it right.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Multiple threads calling scalable_allocator will work well. One thread calling scalable_allocator will still be one thread doing the work.
2. Actually what I meant was that your code seems needlessly complicated, and I'm not sure what it's about. Instead, you could just redirect operators new and delete to the C interface (make sure to get all signatures), or replace malloc as shown in the Tutorial.
Something like the following should work:
2. Actually what I meant was that your code seems needlessly complicated, and I'm not sure what it's about. Instead, you could just redirect operators new and delete to the C interface (make sure to get all signatures), or replace malloc as shown in the Tutorial.
Something like the following should work:
[bash]// copy&paste for retargeting C++ new/delete #include#include "tbb/scalable_allocator.h" void* operator new(std::size_t size) throw(std::bad_alloc) { if(void* ptr = scalable_malloc(size)) return ptr; else throw std::bad_alloc(); } void* operator new(std::size_t size, const std::nothrow_t&) throw() { return scalable_malloc(size); } void operator delete(void* ptr) throw() { scalable_free(ptr); } void operator delete(void* ptr, const std::nothrow_t&) throw() { scalable_free(ptr); } void* operator new[](std::size_t size) throw(std::bad_alloc) { if(void* ptr = scalable_malloc(size)) return ptr; else throw std::bad_alloc(); } void* operator new[](std::size_t size, const std::nothrow_t&) throw() { return scalable_malloc(size); } void operator delete[](void* ptr) throw() { scalable_free(ptr); } void operator delete[](void* ptr, const std::nothrow_t&) throw() { scalable_free(ptr); }[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I second to Raf that overriding new and delete operators seems more natural way to go.
The code you showed has a problem actually:
The code you showed has a problem actually:
[cpp] A():someCondition(true) { if (!someCondition) { b = scalable_allocatorHere you allocate memory for one object of Base type but then construct an object of a derived type in this place. It only works if derived classes do not add any new data members, which is not a common case. More likely the constructor will corrupt some data in neighbor memory, either another allocated object or service structures used by the allocator.().allocate( 1 ); ::new(b) K(); } else { b = scalable_allocator ().allocate( 1 ); ::new(b) V(); } } [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps it would be helpful if those redirections were mentioned in the Tutorial, or even provided as a macro?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We provide run-time replacement of the standard memory allocation routines (including global new and delete) through tbbmalloc_proxy library and the corresponding header. And, it is described in the Tutorial.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that typical C++ objects suffer less space overhead than bigger allocations, so I would prefer to selectively redirect new/delete but not malloc/free, or at least to be able to test this assumption.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think advanced programmers who may want to do something like what you described don't need a tutorial for how to do it :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, so my feeling queasy is justified :) Thanks Alexey and Raf...looks like overloading is the way to go.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page