Recently I tried to replace the stock zlib with IPP zlib 7.0.6 on 64-bit Linux in order to boost the performance of a project using HBase 0.99.2. However I observed slowdown in compression performance by about 30%. I measure the time that "deflate" function takes (inside Hadoop native library), and indeed it's slower than stock zlib and the slowdown happens almost all inside "deflate" calls.
I wrote a couple of test programs separately which invoke zlib. In those cases, IPP shows good amount of improvement over stock one. It seems the slowdown only happens when it's used with HBase.
I don't know what could cause IPP zlib to be slower than the stock one. Has anyone some ideas? Thanks.
It's an interesting observation. Could you tell what CPU is on HBase computer ?
Two other things would be helpful to know:
- are you sure that you call ippInit() (or, ippStaticInit() in IPP 7.x) in your HBase version of IPP zlib ?
- what is average size of buffer which is deflated in HBase ?
Thanks for your reply.
The CPU on which my project is running is Xeon E5-2670 v2.
Yes, I modified the Hadoop native library a little bit so it calls ippInit when it's loaded, and I verified it's calling the e9 functions which is for AVX.
I think in Hadoop/HBase the maximum buffer that can be sent to deflate each round is 64K (I didn't change that). I logged the input data size for all deflate calls. Most of them are from 64K to 128K, and they got compressed in two rounds.
What version of Hadoop, and - more important - what version of open-source ZLIB you are speaking about ? I am asking this, because recently both Intel and CloudFlare invested into ZLIB as open-source.
I think I have found out why. IPP zlib doesn't like the pattern the random data generator generates data in the benchmark of HBase and it runs slower than stock zlib. I tried with some other datasets and found IPP does give some improvement, to various degrees. I used a different random data generator during the separated tests so they gave the different results. I did expect different dataset would impact absolute performance of the two libraries, but I didn't expect the relative performance is also affected.