Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

boost::thread pool vs tbb parallel_for

kdin
Beginner
1,915 Views

 

Hi guys. I'm testing Intel TBB and i would appreciate any comments.

"To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner."

So, why boost::threading pool is more high performance than Intel TBB? 

Intel TBB is a task oriented model that it know hardware features. It know a better way to do something. So...I dont understand because Intel TBB has low performance than boost threading pool. 

ps: I have a Core i7 Windows Pro and Intel TBB testing creates 4 threads to execute task.

Thank you very much for your time.


class Engine
{
public:
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( tbb::blocked_range< size_t >& r ) const

    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );
    }

    void Verify( std::string& str ) const
    {

       ...

    }

    std::vector< std::string >& m_v;

    void Start()
    {
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 10; iIndex++ ) //creating 10 threads...
        {
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );
        }

        grp.join_all();

    }

    void WorkThread( int iIdx ) // Each thread take a range from vector...thread 0 handle m_v[0]...m_v[99], thread 1 handle m_v[100]...m_v[199], ....
    {
        int iStart = ( iIdx * 100 );
        int iEnd = iStart + 99 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );

    }

...

    void ParallelApply( std::vector< std::string >& v ) // low performance(Intel TBB)
    {
        DWORD dwStart = GetTickCount();

        tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) ); 

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~1000 miliseconds
    }

    void ThreadLevelApply( std::vector< std::string >& v ) // high performance(boost::threading pool)
    {
        DWORD dwStart = GetTickCount();

        Engine eng( v );
        eng.Start();

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~500miliseconds
    }

 

0 Kudos
14 Replies
kdin
Beginner
1,915 Views

This code is more friendly...

class Engine
{
public:
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( tbb::blocked_range< size_t >& r ) const

    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );
    }

    void Verify( std::string& str ) const
    {

       ...

    }

    std::vector< std::string >& m_v;

    void Start()
    {
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 10; iIndex++ ) //creating 10 threads...
        {
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );
        }

        grp.join_all();

    }

    void WorkThread( int iIdx ) // Each thread take a range from vector...thread 0 handle m_v[0]...m_v[99], thread 1 handle m_v[100]...m_v[199], ....
    {
        int iStart = ( iIdx * 100 );
        int iEnd = iStart + 99 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );

    }

...

    void ParallelApply( std::vector< std::string >& v ) // low performance(Intel TBB)
    {
        DWORD dwStart = GetTickCount();

        tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) ); 

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~1000 miliseconds
    }

    void ThreadLevelApply( std::vector< std::string >& v ) // high performance(boost::threading pool)
    {
        DWORD dwStart = GetTickCount();

        Engine eng( v );
        eng.Start();

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~500miliseconds
    }

 

 

0 Kudos
Vladimir_P_1234567890
1,915 Views

Hello kdin,

I have a few questions

kdin wrote:
ps: I have a Core i7 Windows Pro and Intel TBB testing creates 4 threads to execute task.

1. Do you have 2 cores low voltage Core i7 on laptop? does it run in maximum performance power scheme?
2. Do you compare 10 boost threads vs 4 tbb threads? have you tried  to compare 10 vs 10?
3. Do you create 1 vector for boost version and a vector per blocked range for tbb? that might be a bootleneck in case range is small.
4. did you try to use tbb::concurrent_vector instead of std::vector?

thank you,
--Vladimir

 

0 Kudos
RafSchietekat
Valued Contributor III
1,915 Views

2. Or 4 and 4? TBB should benefit from only running one thread per hardware thread, not suffer.

3. I don't see a vector per blacked_range, only references, so that doesn't seem to be it, either.

4. tbb::concurrent_vector does not provide better random access.

Aiming for absolutely trivial here:

"for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) -> "for( size_t iIndex = r.begin(), iIndex_end = r.end(); iIndex != iIndex_end; ++iIndex ) ".

It's probably just a red herring, but you never know...

What also might help is to run the whole test at least twice, to warm up TBB's thread pool. That only looks like cheating, but in a real program that's an essential part of what makes the difference with boost::thread_group instances that are created and destroyed all the time: tbb::parallel_for only has to dispatch tasks to threads that already exist.

Although none of that really explains why TBB should be slower here, only why it isn't much faster.

My money is on the red herring. :-)

(Added) Oh, one more thing: perhaps try to set a grainsize, perhaps 100 or so, just to see what happens.

0 Kudos
kdin
Beginner
1,915 Views

Thank to Vladimir Polin and Raf Schietekat for answers.

See comments added below.

1. Do you have 2 cores low voltage Core i7 on laptop? does it run in maximum performance power scheme?

A. Yes, The Power settings in Windows 7 is set for maximum performance. Addition settings are
Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz...2 cores and 4 threads in mobile processor

2. Do you compare 10 boost threads vs 4 tbb threads? have you tried  to compare 10 vs 10?

A. Thread switch in situation with 10 boost threads is very expensive for Windows. Intel TBB say "To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner." Efficient manner...so Intel TBB solution(4threads) should be better than 10 boost threads.

3. Do you create 1 vector for boost version and a vector per blocked range for tbb? that might be a bootleneck in case range is small.

A. In my test std::vector has 1000 itens. In case boost version each thread handle 100 itens with switch operations(that is very expensive for operating system). In case Intel TBB version each thread handle 250 itens( 1000 / 4 ) without switch.

4. did you try to use tbb::concurrent_vector instead of std::vector?

A. But my test is just to verify Intel TBB. Not has concurrent in std::vector. 

int _tmain(int argc, _TCHAR* argv[])
{
     /* initialize random seed: */
    srand (time(NULL));

    Engine eng;

    std::vector< std::string > vNumbers;

    for( int iIndex = 0; iIndex < 1000; iIndex++ )
    {
        std::string str = eng.Generate();
        vNumbers.push_back( str );
    }

    char ch = 0;

    while( scanf( "%c", &ch ) )
    {
        if( ch == 10 )
            continue;
        
        if( ch == 0x31 )
            ParallelApply( vNumbers );
        else
            ThreadLevelApply( vNumbers );

    }
}

Raf, I repeatedly execute without stop app - Intel TBB threads ID not changed between the executions.

    void Verify( std::string& str ) const
    {
        std::stringstream out;
        std::string str_ = GenerateDig( str );

        out << "c:\\logs\\" << GetCurrentThreadId() << ".txt";

        std::ofstream ofs;
        ofs.open( out.str().c_str(), std::ofstream::out | std::ofstream::app );

        ofs << str << std::setw(10);
        if( str == str_ )
            ofs << "OK";
        else
            ofs << "NOK";

        ofs << std::endl;
        ofs.close();

    }

 

 

0 Kudos
Vladimir_P_1234567890
1,915 Views

kdin wrote:

2. Do you compare 10 boost threads vs 4 tbb threads? have you tried  to compare 10 vs 10?

A. Thread switch in situation with 10 boost threads is very expensive for Windows. Intel TBB say "To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner." Efficient manner...so Intel TBB solution(4threads) should be better than 10 boost threads.

OK then, could you try to set 4 boost threads:) to compare apples-to-apples as Raf wrote.

0 Kudos
kdin
Beginner
1,915 Views

Hi Polin, thank again for your reply.

So now we have 4 boost threads vs Intel TBB(4 threads) and results....the 4 boost threads have been better than Intel TBB. The test have been done in release mode.

In this scenario, why to use Intel TBB? Below, app code to check.

Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(1139)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(1623)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2044)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2480)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(4415)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2652)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2496)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2371)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2886)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(2247)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(967)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(686)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(702)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(718)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(702)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(702)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(686)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(718)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(733)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(796)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(717)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...

****************************app code in visual studio 2008**********************************************

// tbb.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "tbb/tbb.h"
#include "tbb/concurrent_queue.h"
#include "boost/array.hpp"
#include <array>
#include <iostream>
#include <fstream>
#include <iomanip>  
#include <boost/thread.hpp>
#include <conio.h>
#include <ctype.h>

class Engine
{
public:
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( std::string& str ) const 
    { // parallel_do
        Verify( str );
    }

    void operator()( tbb::blocked_range< size_t >& r ) const
    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );
    }

    void Verify( std::string& str ) const
    {
        std::stringstream out;
        std::string str_ = GenerateDig( str );

        out << "c:\\logs\\" << GetCurrentThreadId() << ".txt";

        std::ofstream ofs;
        ofs.open( out.str().c_str(), std::ofstream::out | std::ofstream::app );

        ofs << str << std::setw(10);
        if( str == str_ )
            ofs << "OK";
        else
            ofs << "NOK";

        ofs << std::endl;
        ofs.close();

    }

    static std::vector< std::string > Empty;

    std::vector< std::string >& m_v;

    std::string Generate() const
    {
        std::stringstream sGen;

        sGen << std::setfill('1') << std::setw(9) << rand();

        std::string str = GenerateDig( sGen.str() );

        return str;
    }

    std::string GenerateDig( std::string& str ) const
    {
        std::stringstream sOut;
        int    iSum = 0, iTimes = 0, iRest = 0, iDig = 0;

        sOut << str.substr( 0, 9 );

        do
        {
            std::string& strAux = sOut.str();

            iSum = 0;

            for( int iCount = 0; iCount < ( 9 + iTimes ); iCount++ )
                iSum += ( ( strAux[ iCount ] - 0x30 ) * ( 10 + iTimes - iCount ) );

            iRest = iSum % 11;
            iDig = 11 - iRest;
            if( iDig > 9 ) 
                iDig = 0;

            sOut << iDig;

        }while( !iTimes++ );

        return sOut.str();

    }

    void Start()
    {
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 4; iIndex++ )
        {
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );
        }

        grp.join_all();

    }

    void WorkThread( int iIdx )
    {
        int iStart = ( iIdx * 250 );
        int iEnd = iStart + 249 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );

    }

};

/* static */ std::vector< std::string > Engine::Empty;

void ParallelApply( std::vector< std::string >& v )
{
    DWORD dwStart = GetTickCount();

    tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) );

    DWORD dwEnd = GetTickCount();

    std::cout << "(" << dwEnd - dwStart << ")ms " << "Elapsed" << std::endl;
}

void ThreadLevelApply( std::vector< std::string >& v )
{
    DWORD dwStart = GetTickCount();

    Engine eng( v );
    eng.Start();

    DWORD dwEnd = GetTickCount();

    std::cout << "(" << dwEnd - dwStart << ")ms " << "Elapsed" << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
     /* initialize random seed: */
    srand (time(NULL));

    Engine eng;

    std::vector< std::string > vNumbers;

    for( int iIndex = 0; iIndex < 1000; iIndex++ )
    {
        std::string str = eng.Generate();
        vNumbers.push_back( str );
    }

    char ch = 0;

    std::cout << "Press (1)Intel TBB or (2)Boost Thread...";

    while( ch = _getch() )
    {
        
        if( ch == 0x31 )
        {
            std::cout << "Choose (1)Intel TBB test...";
            ParallelApply( vNumbers );
        }
        else
        {
            std::cout << "Choose (2)Boost Thread test...";
            ThreadLevelApply( vNumbers );
        }

        std::cout << "Press (1)Intel TBB or (2)Boost Thread...";

    }

    return 0;
}

 

 

0 Kudos
RafSchietekat
Valued Contributor III
1,915 Views

It's a very strange and inefficient thing to do, opening and closing a file for each element, but I don't see anything obvious that would work against TBB here. It also means that hoisting end() from the for loop would not improve anything as it might for a simpler loop body.

Note that you should pass size() into that blocked_range, not size()-1. Could you also pass a grainsize 250 here to see what happens (this will yield chunks of 250 each starting from 1000: 1000->500->250, and 250 is no longer divisible), although it probably won't do much because there's so much work per element.

It's a mystery to me...

 

0 Kudos
kdin
Beginner
1,915 Views

Hi Raf.

Yes, think in Task instead of Thread is good but not with lost performance. So why to use Intel TBB? Just for thread-safe containers like concurrent_vector? I dont think so.

Thanks for your time.

 

0 Kudos
RafSchietekat
Valued Contributor III
1,915 Views

kdin wrote:

So why to use Intel TBB?

I'm not worried in general, but this is quite peculiar and interesting. You've apparently managed to find a use case or parameters of operation that happen to be problematic, or something else we've overlooked so far. I might even do some testing of my own based on the code you've provided to find out why, although not right away. Other than actually applying my suggestions above, I would certainly disable the heavy I/O to see what difference that makes: if anything like this is needed for production code, my first instinct would be to at least apply thread-local storage like tbb::enumerable_thread_specific to open the files lazily and only close them after parallel_for has ended, or rather send the data to a separate I/O thread or use a pipeline instead.

0 Kudos
Vladimir_P_1234567890
1,915 Views

I've run the sample for both 4 and 10 threads and do not see any difference. I'm wondering why my results are 100x faster for tbb and 50x for boost libraries. I suppose because of these I/O operations.

Config:
Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz processor
dual channel memory + SSD

--Vladimir

[bash]
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(32)ms Elapsed

Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(15)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(15)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(46)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(47)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(47)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(31)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(15)ms Elapsed
[/bash]

.

0 Kudos
Vladimir_P_1234567890
1,915 Views

OK, i've found this 50x. I have not had c:\logs folder so I/O stream did not work. :-)

Now it writes to the files. Both aglorithms use 4 threads and as you can see both algorithms are I/O bound.

[bash]

Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(125)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(141)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(109)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(109)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(93)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(110)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(94)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(94)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(94)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(93)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(78)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(93)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(78)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(109)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(125)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(93)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(94)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(109)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(94)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(110)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(156)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(140)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(109)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(93)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(78)ms Elapsed

[/bash]

--Vladimir

0 Kudos
RafSchietekat
Valued Contributor III
1,915 Views

That's more like it! Was that Windows as well, or Linux? Best to only vary one thing at a time, and I'm on OS X with SSD, so...

Maybe we still need another comparison point in the original environment, with sequential execution perhaps boost::thread_pool was better because it's worse? ;-)

0 Kudos
Vladimir_P_1234567890
1,915 Views

Windows 7 + visual studio 2013.

 

0 Kudos
kdin
Beginner
1,915 Views

Hi guys! I thank you for taking time out to answer my question.

I used IO operation just for simulate an expensive process like database query. Without this I can't compare Intel TBB and boost thread because elapsed time are look very much alike.

Intel TBB vs 10 boost threads

Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(0)ms Elapsed

Intel TBB vs 4 boost threads

Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (1)Intel TBB test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(15)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(0)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(16)ms Elapsed
Press (1)Intel TBB or (2)Boost Thread...Choose (2)Boost Thread test...(0)ms Elapsed

 

0 Kudos
Reply