Database connection pooling library using cilk plus

amandeepgautam · ‎08-19-2012

Hi,

I am querying the database for a like 10000 tables ( the database is quite big ) and as of now my code for querying the database is serial. To speed up the process I intend to openup multiple connections to the database and query database i.e. embed some parallelism in the cilk code for my project. I cannot think of a way to implement this in cilk plus

I thought of doing it in the following manner( but I don't know anything that would help me):

Write a main queue in which all other threads put request to and a shared counter which takes into account how many connections are there at the this time if connection exceds maximum connection then you have to wait otherwise a connection would be granted to you.

Please suggest some idea to implement this idea because the data extraction part is proving to be a real bottleneck in the process.

Any help is appriciated!!!

Thanks.

Georg_Z_Intel · ‎08-20-2012

Hello,

without having a reference to work with it's hard to recommend a solution. The idea of Intel® Cilk™ Plus is to add task level parallelism with minimal changes to an existing serial implementation.

Basically, I can think of two major approaches to your problem:

You know the amount of possible connections up-front and can guarantee them (e.g. by reserving resources):
Then a divide and conquer pattern could help, using a "cilk_for" loop over all queries. For that you partition the total amount of queries into chunks which are defined as

#pragma cilk grainsize = min(n/c, n/(__cilkrts_get_nworkers()-1))

where n is the number of queries and c the number of maximal possible connections. As a result only as many tasks are executed in parallel as connections or workers are available. Having more connections than workers limits the parallelism to the amount of workers (you need to scale your system then). On the other hand, having more workers than connections possible you're limited by the interface to the database.
In case you don't know the connections up-front you can implement something like a worker pattern:
In a sequential (main) loop over all queries, you obtain a connection and spawn off a task for doing a single query. In case there's no connection possible, make sure to block until there is one. When spawning off the task provide the acquired connection to it. If the task is done it can free the connection and stop the main loop from waiting for a free connection.
The same scalability considerations from the divide and conquer approach apply here, too.

Regarding the overhead:
The first solution has the least overhead and makes sure to minimize work stealing (which is expensive). However, if you don't know whether and how many connections are possible you need to use something like the second solution. Unfortunately that one has more overhead because of increased work stealing. On the other hand, it should not add too much in real-life because the connections are the bigger bottle-neck here.

In all cases, you might want to use a reducer, e.g. "reducer_list_append", shared among all tasks. Each task can add the result to it and after the synchronization the order of such results is the same as in the sequential execution.

That's just a basic hint, of course. For sure there are more possible solutions but those two approaches above should give you some ideas already.

Best regards,

Georg Zitzlsberger

Pablo_H_Intel · ‎08-20-2012

I think Georg has the right basic idea, but I strongly disagree with some of the particulars and tradeoffs.

You should absolutely NOT be setting the grain size explicitly so as to divide the queries evenly among the workers. A work-stealing scheduler depends on "parallel slackness" to ensure good CPU utilization. Parallel slackness is defined as the ratio of expressed parallelism to available parallel resources. The grainsize that Georg recommends has a parallel slackness of aproximately 1, which means that if any set of queries takes significantly longer than the others (very, likely, I would think), then the other CPUs will be idled until the longer set completes. Ideally, a parallel program will have a slackness of 10 or better, meaning that the work is divided into at least 10 times as many tasks as there are cores to execute them.

In this case, however, the ideal grain size is probably 1, because the overhead of the query and processing will likely dwarf the overhead of the parallel loop. In fact, you can tollerate quite a lot of locking and/or atomic operations. To make sure that you don't oversubscribe the connection list, I suggest the following approach:

while (there are queries to process)
{
    get a connection (block if none available)
    get the next query
    spawn the query on that connection
}

Spawning within a loop (as above) is usually not recommended because it increases the amont of expensive stealing, as Goerg said. In this case, however, the cost of the steal is still cheap compared to the work being done per spawn, making this a reasonable approach. The advantage to this loop is that it automatically balances the CPU and connection resources, using both as efficiently as possible. If you create at least as many connections as there are CPUs, then the loop should never block (unless there are other concurrent tasks using the same pool of connections).

-Pablo