Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

What makes TSX so slow?

Robin_L_
Beginner
385 Views

Sorry for my poor English. And I am a beginner of C++, maybe my code is a little bit strange.

At present, I am trying to learn transactional memory. I tried both hardware and software transactional memory. I use java and clojure to implement STM and the TBB to implement HTM (My CPU supports TSX instruction set).
I used the classic bank modle to test these two TM modles. I created bank account objects and transfer money from one to another in many threads. When the money is insufficient in the account, it will throw out an exception and print it on the console. I created 40,000 threads and the size of thread pool is 8.

However, the HTM versoin is much slower than STM version. In Java version, it costs about 2000 ms, however, C++ costs 6800 ms.

I use parallel_for and speculative_spin_mutex in TBB to control concurrency and TSX.
Here is the codes in my HTM version.

    #define random(x) (rand()%x)
    using namespace std;
    void transfer(Account*, Account*, int);
    const int AccountsSIZE = 100;
    Account* accounts[AccountsSIZE];
    
    int main() {
    
	    tbb::parallel_for(0, AccountsSIZE, 1, [=](int i) {
	    	accounts = new Account(1000);
	    });
    
	    srand((int)time(0));
	    const int TransferTIMES = 40000;
    
	    tbb::parallel_for(0, TransferTIMES, 1, [=](int i) {

	    	try{
	    		transfer(accounts[random(AccountsSIZE)],     accounts[random(AccountsSIZE)], random(1000));
	    	}
	    	catch (const std::exception& e)
	    	{
	    		cerr << e.what() << endl;
	    	}
	    	
	    });
    
	    cout << accounts[0]->getBalance() << endl;
	    int total_balance = 0;
	    for (size_t i = 0; i < AccountsSIZE; i++)
	    {
		    total_balance += (accounts->getBalance());
	    }
	    cout << total_balance << endl;
    	tbb::parallel_for(0, AccountsSIZE, 1, [=](int i) {
		    delete accounts;
	    });

    	return 0;
    }
    //Transfer money from one to another account
    void transfer(Account* from, Account* to, int amount) {
	    tbb::speculative_spin_mutex::scoped_lock lock(mutex); //Use speculative_spin_mutex to make a critical section that can be optimized by TSX
	    if ((from->getBalance())<amount)
	    {
		    throw std::invalid_argument("Illegal amount!");
		    std::cout << "Transaction failed!" << std::endl;
	    }
	    else {
		
	    	from->setBalance((from->getBalance()) - amount);
	    	to->setBalance((to->getBalance()) + amount);
	    }
    }

And here is my profiler file my codes, however I am not very understand about it, I don't know which part makes my code so slow. Because my design thinking is almost same as java and clojure version.

0 Kudos
2 Replies
Alexei_K_Intel
Employee
385 Views

Hi Robin,

If you use rand() in parallel; perhaps, you want to consider the following questions on SO:

I suppose that some implementations of rand() may use internal locking (that may slowdown the application).

Regards, Alex

0 Kudos
Robin_L_
Beginner
385 Views

Thank you Alex, with your suggestion, I used boost random library to implement the thread-safe random number. But now, there is another promble, it seems that when I set the task_scheduler_init as 2 the program has the fastest performance. It is strange because my CPU is i7 6700K. When the task_scheduler_init increase to 8, the perfromace slow down and keep constant for a long time. Affer task_scheduler_init is more than 40000, the performance slow down again.

In this example, I created 1 million transfer tasks. When the task_scheduler_init is 2, the whole program costs 8 seconds, When the task_scheduler_init is 8, it is 10 seconds.

#include <tbb/spin_rw_mutex.h>
#include <iostream>
#include "tbb/task_scheduler_init.h"  
#include "tbb/task.h"
#include "boost/random.hpp"
#include <ctime>
#include <tbb/parallel_for.h>
using namespace tbb;
tbb::speculative_spin_rw_mutex mu;
class Account {
private:
	int balance;
public:
	Account(int ba) {
		balance = ba;
	}
	int getBalance() {
		return balance;
	}
	void setBalance(int ba) {
		balance = ba;
	}
};

//Transfer function. Using speculative_spin_mutex to set critical section
void transfer(Account* from, Account* to, int amount) {
	speculative_spin_rw_mutex::scoped_lock lock(mu);
	if ((from->getBalance())<amount)
	{
		throw std::invalid_argument("Illegal amount!");
	}
	else {
		from->setBalance((from->getBalance()) - amount);
		to->setBalance((to->getBalance()) + amount);
	}
}
const int AccountsSIZE = 100;



Account* accounts[AccountsSIZE];
//Random number generater and distributer
boost::random::mt19937 gener(time(0));
boost::random::uniform_int_distribution<> distIndex(0, AccountsSIZE - 1);
boost::random::uniform_int_distribution<> distAmount(1, 1000);
/*
Function of transfer money
*/
void all_transfer_task() {
	task_scheduler_init init(8);//Set the number of tasks can be run together
	/*
	Initrial accounts
	*/
	parallel_for(0, AccountsSIZE, 1, [=](int i) {
		accounts = new Account(1000);
	});

	const int TransferTIMES = 10000000;
	//All transfer tasks
	parallel_for(0, TransferTIMES, 1, [&](int i) {
		
		try {
			transfer(accounts[distIndex(gener)], accounts[distIndex(gener)], distAmount(gener));
		}
		catch (const std::exception& e)
		{
			//cerr << e.what() << endl;
		}
		//std::cout << distIndex(gener) << std::endl;
	});

	std::cout << accounts[0]->getBalance() << std::endl;
	int total_balance = 0;
	for (size_t i = 0; i < AccountsSIZE; i++)
	{
		total_balance += (accounts->getBalance());
	}
	std::cout << total_balance << std::endl;
	parallel_for(0, AccountsSIZE, 1, [=](int i) {
		delete accounts;
	});
}

 

0 Kudos
Reply