Speedup, and Scalable allocator for virtual instances.

Nav · ‎03-10-2011

Two questions:

1. I ran quantify on a very large program I have, and saw one section of memory allocation with "new", which caused heap contention. On an 8 core machine, only one cpu core was being utilized 100%, while the rest were idle. (saw that using the mpstat command)

I replaced that "new" and almost all other "new"'s with scalable_allocator (and replaced the corresponding deletes), but the result was the same. Only one core was being utilized 100%. I thought the load would get distributed on multiple cores.
Am using Boost threads. I understand scalable_allocator is supposed to create local heaps for each thread.

The question is: Am I wrong to assume that scalable_allocator would distribute cpu load? Is there a way to measure/know that scalable_allocator is improving performance (I'm assuming I'm using the wrong measuring tool)? Would VTune analyzer help?

2. I'm a bit queasy about object instances instantiated and allocated with scalable_allocator at runtime. Would it be correct if I do something like:
b = scalable_allocator().allocate( 1 );
::new(b) K();
And the destruction is done like:
scalable_allocator().destroy(b);
scalable_allocator().deallocate(b, 1);

Instantiating with a child-class type and allocating with a parent-class type, and then destroying and deallocating with the parent class type? Is this the right way to do it?
(If you need to see the program, it's below. It seems work fine, but I just needed a confirmation)

#include
#include
#include
using namespace std;
using namespace tbb;

class Base
{
public:
Base() {cout<<"Base()"< virtual ~Base() {cout<<"Base ~"<};

class K: public Base
{
public:
K() {cout<<"K()"< ~K() {cout<<"K ~"<};

class V: public Base
{
public:
V() {cout<<"V()"< ~V() {cout<<"V ~"<};

class A: public Base
{
public:
Base* b;
bool someCondition;

A():someCondition(true)
{
if (!someCondition)
{
b = scalable_allocator().allocate( 1 );
::new(b) K();
}
else
{
b = scalable_allocator().allocate( 1 );
::new(b) V();
}
}

~A()
{
scalable_allocator().destroy(b);
scalable_allocator().deallocate(b, 1);
}
};

int main()
{
A* a = new A;
a->~A();
}

Outputs:
Base()
Base()
V()
V ~
Base ~
Base ~

RafSchietekat · ‎03-11-2011

1. If only one thread does the allocating, the scalable allocator is not going to redistribute that load to other threads.

2. Don't be queasy.

Nav · ‎03-12-2011

1. Let's say there's one piece of code which uses "new", that multiple threads can access. If I replace this "new" with scalable_allocator, shouldn't it scale? This is the kind of situation I have, and I'm assuming that the memory will now be allocated in the individual heap of every thread. Is that right?

2. Thanks :) I presume I was doing it right.

RafSchietekat · ‎03-13-2011

1. Multiple threads calling scalable_allocator will work well. One thread calling scalable_allocator will still be one thread doing the work.

2. Actually what I meant was that your code seems needlessly complicated, and I'm not sure what it's about. Instead, you could just redirect operators new and delete to the C interface (make sure to get all signatures), or replace malloc as shown in the Tutorial.

Something like the following should work:

[bash]// copy&paste for retargeting C++ new/delete

#include 
#include "tbb/scalable_allocator.h"
void* operator new(std::size_t size) throw(std::bad_alloc) {
  if(void* ptr = scalable_malloc(size)) return ptr; else throw std::bad_alloc();
}
void* operator new(std::size_t size, const std::nothrow_t&) throw() {
  return scalable_malloc(size);
}
void  operator delete(void* ptr) throw() { scalable_free(ptr); }
void  operator delete(void* ptr, const std::nothrow_t&) throw() { scalable_free(ptr); }
void* operator new[](std::size_t size) throw(std::bad_alloc) {
  if(void* ptr = scalable_malloc(size)) return ptr; else throw std::bad_alloc();
}
void* operator new[](std::size_t size, const std::nothrow_t&) throw() {
  return scalable_malloc(size);
}
void  operator delete[](void* ptr) throw() { scalable_free(ptr); }
void  operator delete[](void* ptr, const std::nothrow_t&) throw() { scalable_free(ptr); }[/bash]

Alexey-Kukanov · ‎03-14-2011

I second to Raf that overriding new and delete operators seems more natural way to go.
The code you showed has a problem actually:

[cpp]  A():someCondition(true) 
  { 
    if (!someCondition) 
    {
      b = scalable_allocator().allocate( 1 );
      ::new(b) K();
    } 
    else 
    {
      b = scalable_allocator().allocate( 1 );
      ::new(b) V();
    } 
   } 
 
[/cpp]

Here you allocate memory for one object of Base type but then construct an object of a derived type in this place. It only works if derived classes do not add any new data members, which is not a common case. More likely the constructor will corrupt some data in neighbor memory, either another allocated object or service structures used by the allocator.

RafSchietekat · ‎03-14-2011

Perhaps it would be helpful if those redirections were mentioned in the Tutorial, or even provided as a macro?

Alexey-Kukanov · ‎03-14-2011

We provide run-time replacement of the standard memory allocation routines (including global new and delete) through tbbmalloc_proxy library and the corresponding header. And, it is described in the Tutorial.

RafSchietekat · ‎03-14-2011

I think that typical C++ objects suffer less space overhead than bigger allocations, so I would prefer to selectively redirect new/delete but not malloc/free, or at least to be able to test this assumption.

Alexey-Kukanov · ‎03-14-2011

I think advanced programmers who may want to do something like what you described don't need a tutorial for how to do it :)

Nav · ‎03-15-2011

Okay, so my feeling queasy is justified :) Thanks Alexey and Raf...looks like overloading is the way to go.