I have a program which implements an iterative algorithm. The iteration requires some pre-calculated components (arrays), which are either calculated by the program on-the-fly before going into the iteration, or are loaded from a backup file generated in a previous run. Note that these components are bit identical, invariably of whether they are loaded from backup or calculated on-the-fly.
The above implies that once the calculation is finished the RAM demand at the iteration stage of the program, either using on-the-fly calculated components or uploaded components, must be the same.
However I noticed a huge difference in memory requirements during iteration depending on whether the component was calculated on-the-fly or uploaded, and that difference seems to be a result of the used OpenMP library.
I noticed the following memory usage during iteration:
From the above I inferred that my code should not leaking any memory due to programming errors. Since libgomp works for "calculated" I presume that there are also no data race conditions. Further, final calculation results are exactly the same for all six possible configurations, further implying absence of data race conditions.
For investigation, I put the relevant section of the code where the component calculation is done in a loop to check whether there is memory build up (memory leak), which is not the case. Therefore while the "libiomp5+calculated" approach needs more memory, that memory is NOT increasing if the calculation is repeated within a loop.
My current conclusion is that the threads somehow are not returning allocated memory once the threaded code section is left?! Is that possible?
compiler: intel clang++ version 2022.2.1
OS: Linux, kernel version 6.1.6
Thanks for posting in Intel Communities.
Could you please provide us with the complete reproducer codes and steps to reproduce your issue at our end?
Thanks & Regards,