We're migrating our embedded system from Intel(R) Core(TM) i7-4700EQ CPU @ 2.40GHz to Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz . During our tests we noticed that, post migration, the calls to Intel IPP library experience sporadic degradation in performance, taking 3-4 times the time taken during the normal performance.
Specifically, the call to ippsFIRMR_32f, which normally takes between 30 micro seconds, take 90-100 microseconds during the times of degradation in performance. This continues for a while - around 20-25 milliseconds, after which the system goes back to giving normal(better) performance ie around 25-30 microsecs/call.
We also saw similar lapses in performance to other IPP calls, around the sametime, but currently decided to focus our efforts on this one call to understand the issue better.
We've confirmed that this happens even when the system doesn't have any other threads/processes that're cpu intensive.
We've also confirmed that this never happens with the older CPU model - i7-4700EQ and it gives steady uniform performance, even with other threads running under normal load.
The configuration of the systems and identical otherwise.
What can be some potential reasons for this varied performance with the new CPU. What are some ways this can be avoided/mitigated.
I can add more info - sample source code that has this problem, etc, if needed.
Thanks for the detailed information.
Please share with us a small reproducer and steps you used to compile and run it on your environment.
Also please let us know which version of IPP you are trying to use, if you are using IPP-2018 then please try updating the IPP to the latest version available and share with us an updated reproducer. Refer to the below link to download the latest IPP version.
Thanks for getting back. Apologies for my delayed response, as I was still looking into this issue yesterday. It turns out that, the reason for the discrepancy between the two processors, can be traced to a driver call. This is an internal driver, developed in house, that reads data from an FPGA. Both the systems run on CentOS 7.7 and the driver function that causes the discrepancy is wait_event_interruptible .
The driver reads data from a DMA, that's populated by the FPGA. The FPGA interrupts are handled by the driver to update the number of bytes available to read and notifies using wake_up_interruptible . On the read call the driver waits using 'wait_event_interruptible' and copies the available data from the DMA when the wait function returns.
For some reason that's still not clear, making this call in the driver, spills over and affects the performance of only the new CPU, after the driver code has finished running, while the old CPU carries on fine.
We're still investigating. If there are specific paths/tests that you'd like us to try, please let us know.
I wanted to add some more information to this investigation. It turns out that slowdowns in performance are not related to the IPP library itself, but can we replicated with any set of floating point calculations. We replaced the IPP function calls with our own dummy filter that mimics the floating point operations and we still continue to experience similar sporadic slowdowns in performance.
We also isolated two cores(ie disabled irqbalance from using them) and dedicated them to our threads and the problem persists. As mentioned before, the one way to stop this from happening is to not make the wait_event call in the driver.
The system also experiences similar issues if instead of reading the data ( with the wait call), the data is set to zeroes, but a delay is introduced with sleep system call, where we sleep for 18-20ms.
Is it possible that making the call puts the CPU In some 'sleep' state and it takes a while to become 'active' again, sometimes. It also seems like the recovery is gradual - while a single IPP/dummy filter calls normally take 30-35 microsecs normally(during good perf), once the performance hurdle is hit, the same calls take 90-100 microsecs. This continues for the next 24-26 ms, when we make another 200 such function calls and then it slowly comes back to the normal state of taking around 30-35 microsecs.
Thanks for the detailed information.
Please send us your environment details like Compiler version, IPP version, and the details of your system on which you are getting those issues. Also, share with us a reproducer with steps that you followed to reproduce this behavior.
We are a bit confused are you trying running it on CPU or FPGA?
Please send us the above details so as to get a good insight into your issue.
Please find the reproducer attached. We're building this with g++ (GCC) 4.8.5 20150623 with the C++11. The command line used to build is:
'g++ --std=c++11 -O3 TestFIRFilter.cpp -lpthread -o fir.'
As I mentioned earlier, this is not related to IPP, but rather the fact that the CPU cores sometimes enter a sleep/wait state. When woken up the system takes a longer time to perform the same set of computations, some times, compared to the version that never sleeps.
The program's being run on an Intel CPU - Intel(R) Core(TM) i7-8850H.
I've attached two samples of the outputs - res-fir.txt and res-fir-sleep.txt, which provides a comparison of the outputs from the same program, with the only difference between them being that one sleeps and other doesn't.
The system we see the problem is an embedded system. They run CentOS 7.7. Because these programs are the only ones that're running on them, there're no other processes of significance on them.
Please let me know if there's any more information required.