About scalability...

aminer10 · ‎10-22-2013

Hello,

I come to an interresting subject, i want to talk about the scalability on multicores of the parallel hashtables and databases engines... i am wondering how can you say that a parallel
hashtable can scale on multicores if the random key acesses are cache unfriendly and the parallel hashtable is memory bound ? i think since the parallel hashtable is memory bound and the random accesses are cache unfriendly that means that the parallel hashtable will not scale on multicores, when you are testing your parallel hashtable, you have also to test the cache misses on the data, cause i think that the data is part of the hashtable application, i mean the parallel hashtable concept is highly interconnected with the way that the data is accessed randomly in a cache unfriendly manner and the hashtables concept is highly interconnected with the way the hashtable data is accessed in a memory bound manner, so this is why i say that parallel hashtables are not scalable on multicores, it's the same for database engines , since they are disk bound, so they are not scalable, and even if they are in-memory databases they will be memory bound and cache unfreindly, so databases are not scalable either on multicores.

We have to be smart please , i have come to an interresting subject...

If you look at those scalable FIFO queues...

http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CFsQFjAE&url=http%3A%2F%2Fwww.cosy.sbg.ac.at%2Fresearch%2Ftr%2F2012-04_Kirsch_Lippautz_Payer.pdf&ei=FaBlUs-qIbTc4APonIHADQ&usg=AFQjCNHAhL-eCnvuCuNg8MSFhAfkRM73Ng&sig2=uy9PIKocjN9eXPAQQN8QlQ

http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=0CGYQFjAF&url=http%3A%2F%2Fwww.cs.tau.ac.il%2F~shanir%2Fnir-pubs-web%2FPapers%2FSPAA2005.pdf&ei=FaBlUs-qIbTc4APonIHADQ&usg=AFQjCNGxX7ZAmgALfgRyL8WaDKjiYyw2uA&sig2=xAGjnkLOQf5bMhS4I2Wfbg

and this one:

https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDgQFjAB&url=https%3A%2F%2Frepository.ist.ac.at%2F124%2F1%2Fmain-spaa2013.pdf&ei=g6BlUqj7MJfi4AOp1oDYCg&usg=AFQjCNH9eKu7ZeXpaoxUhvTk7aW-akWrVg&sig2=lPv-uvllOBClf0xZjBSExA

Researchers will say that those FIFO queues are scalable, but researchers are only testing pushes() and pops(), they are not testing the cache misses on the data inside the FIFO queue, i mean we can not look at the scalability ON MULTICORES from only the viewpoint of the FIFO queue algotihms and appllication, because the FIFO queue concept is higly interconnected with how the data inside the FIFO queue is served in a memory bound manner and in a cache-unfrienly manner, cause i think that the producers threads will write the datas on records or objects and push them on the FIFO queue, and the producers will have to have cache misses
on those datas , and the producers will access the data inside the FIFO queue in a memory bound manner, so this is why i say that those FIFO queues above are not scalable in pratice.

Hope you have understood what i mean.

I have tested my Parallel archiver(you will find it here: http://pages.videotron.com/aminer/ )
that is using parallel compression and it is scaling but not perfectly, so why database engines and hahstables are not scalable, since they are disk bound or/and memory bound and your Parallel archiver is scaling but not linearly and perfectly ? cause you have to understand that parallel compression can be expensive, it's the compression that is expensive and that can scale on multicores, but the memory data tranfers are not scalable, it's the same for game engines
they will not gain on scalabilty on memory transfers , but on rendering or compression or something like that that scales on multicores, mathematic algorithms that dont use big memory's data transfers can scale also on multicores, but database engines and hashtables are memory bound or/and disk bound etc. so they are not scalable on multicores.

We have to be smart please, can you say Parallel compression is only the compression excluding the memory data transfers in the parallel compression that is not scalable ?

Can you say a FIFO queue is only the push and pop excluding the memory data tranfers inside the queue that is not scalable ?

I think no, the memory's data transfer that is not scalable is part of the parallel compression and the FIFO queue.

This is why i have said that those FIFO queues that we call scalable
are in fact not scalable.

You will say that those scalable FIFO queues on multicores that i have presented to you are scalable in theory , this is why we call them scalable, but i don't agree, cause when you say scalable it means scalable also in pratice, but we are far from resolving the problem
of the non scalability on multicores of the data memory transfer, i don't mean data cache memory transfer, but data memory transfer that is not scalable, so as i said since we can not exclude the data memory transfer from the FIFO queue application or concept, as we can not exclude
the data memory tranfer from the parallel compression, hence we can not say that scalable FIFO queues on multicores are scalable, cause as i have explained to youl i think those scalable FIFO queues are not scalable, it's the same for parallel hashtables and database engines, we can not exclude the data memory tranfer that is not scalable from the hashtables applications or concept or from the database engines applications or concept, so since they are memory bound and/or disk bound and cache unfriendly that means they are not scalable on multicores.

Question:

Amine, you have said that those scalable FIFO queues and parallel hashtables and database engines are not scalable on multicores , is there any hope ?

Answer:

For databases engines you have to use distributed databases , and distribute your data over a network of computers, and we are lucky with databases, cause databases also look like RWLocks, cause in many databases scenarios, since writes transactions are much fewer than read transactions , so the replication part that is expensive will take less time, this is why distributed databases are the answer for scalability.

Question:

Amine, I think you have forgot something with distributed databases ?

Answer:

You are right, there is still a big problem with distributed databases and that's very sad, cause distributed databases suffers from the same problem as the problem of contention in a multicore scenario, cause if there is a lot of writes transactions at the same time(as in the case of contention on multicores) if we must ensure a realtime replication of the database, the database can suffer from high degradation in the throuhput, you will say that to resolve this problem we must use many many more computers , but this is too costly i think, so that's sad.

Thank you,
Amine Moulay Ramdane.