topic Hi zer0nes, in Intel® oneAPI Math Kernel Library

Performance gets worse over time for the same instructions

zer0nes — Mon, 27 May 2013 20:47:10 GMT

First, I'm not if this is the right forum for this question. I don't know what is as the reason could be due to hardware or MKL or .NET or some other hidden factors.

I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training.

I have observed unpredictable performance across iterations (example below) and woud like to understand why. In some other runs, the number of connections processed per second dropped to ~600M for a few iterations (very strange). For the one below, it took 6h to finish the training (i.e. each iteration takes about 12 minutes on average). It's rather consistent that the perf degrates towards the end. The perf accounting is more consistent when I run a smaller job (e.g. 20 minutes to finish).

The code is large and not sharable. If you can't pinpoint why, a hint to help me investigate further would also be appreciated.

[plain]

Iterations:1/30, 1504.65M connections processed per second
Iterations:2/30, 1505.16M connections processed per second
Iterations:3/30, 1505.16M connections processed per second
Iterations:4/30, 1504.96M connections processed per second
Iterations:5/30, 1503.38M connections processed per second
Iterations:6/30, 1504.68M connections processed per second
Iterations:7/30, 1502.40M connections processed per second
Iterations:8/30, 1506.11M connections processed per second
Iterations:9/30, 1503.20M connections processed per second
Iterations:10/30, 1504.95M connections processed per second
Iterations:11/30, 1502.34M connections processed per second
Iterations:12/30, 1498.91M connections processed per second
Iterations:13/30, 1490.70M connections processed per second
Iterations:14/30, 1477.59M connections processed per second
Iterations:15/30, 1459.92M connections processed per second
Iterations:16/30, 1433.61M connections processed per second
Iterations:17/30, 1402.28M connections processed per second
Iterations:18/30, 1356.30M connections processed per second
Iterations:19/30, 1342.68M connections processed per second
Iterations:20/30, 1306.84M connections processed per second
Iterations:21/30, 1263.10M connections processed per second
Iterations:22/30, 1236.72M connections processed per second
Iterations:23/30, 1209.60M connections processed per second
Iterations:24/30, 1183.91M connections processed per second
Iterations:25/30, 1157.60M connections processed per second
Iterations:26/30, 1140.60M connections processed per second
Iterations:27/30, 1112.54M connections processed per second
Iterations:28/30, 1086.06M connections processed per second
Iterations:29/30, 1071.61M connections processed per second
Iterations:30/30, 1055.94M connections processed per second

[/plain]

Hello,

Chao_Y_Intel — Tue, 28 May 2013 00:57:56 GMT

Hello,

Which MKL functions are used in your application? Also is the memory enough for the computation during the iterations (considering some MKL function may internally allocate some memory ).

Also, it may be helpful to run some performance profiling tools, for example, Intel Vtune Amplifier, to profile your application, and understand which part of the code is taking major time.

Thanks,
Chao

>>...I have a neural network

SergeyKostrov — Tue, 28 May 2013 04:33:00 GMT

>>...I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled >>dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between >>iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training... Since your description is too generic I would suggest to start commenting out some parts of codes followed by a set of tests. Another simple verification: take a look at Windows Task Manager for resource leaks, and verify that memory usage is stable ( Not growing ).

Hi zer0nes,

Dmitry_B_Intel — Thu, 13 Jun 2013 09:00:35 GMT

Hi zer0nes,

Things I'd try: (1) check for memory leaks (monitor task manager, as Sergey K proposed), (2) affinitize the threads (e.g. set KMP_AFFINITY=compact), (3) check if MKL memory allocator is the cause (set MKL_DISABLE_FAST_MM).

Thanks
Dima

>>...check if MKL memory

SergeyKostrov — Fri, 14 Jun 2013 00:15:26 GMT

>>...check if MKL memory allocator is the cause ( set MKL_DISABLE_FAST_MM ). Hi Dmitry, Could you explain where MKL_DISABLE_FAST_MM comes from? Do you mean an environment variable, or a macro or a function? I see that there is a description for MKL_Disable_Fast_MM function declared in mkl_service.h as: ... _Mkl_Api( int, MKL_Disable_Fast_MM, ( void ) ) #define mkl_disable_fast_mm MKL_Disable_Fast_MM ... Thanks in advance.

Thanks to all :).

zer0nes — Mon, 17 Jun 2013 00:20:06 GMT

Thanks to all :).