Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

Memory leak in service_topo.cpp on Windows x64 with core count more 32

Vasilyev__Alexander
830 Views
  1. On Windows x64 system in service_topo.h type LNX_PTR2INT is defined as follows:
    #ifdef _x86_64
           #define LNX_PTR2INT __int64
           #define LNX_MY1CON 1LL
    #else
           #define LNX_PTR2INT unsigned int
           #define LNX_MY1CON 1
    #endif
    Under Windows x64 macro _M_X64 should be used (to use more than 32 cores but not more than 64). Some other solution should be used to work on machines with more than 64 cores.
    Because LNX_PTR2INT is defined as unsigned int which on Windows is 32 bit
    then function static int __internal_daal_countBits(DWORD_PTR x) return maximum of 32,
    then functiion static void __internal_daal_setChkProcessAffinityConsistency( unsigned int lcl_OSProcessorCount )
    in statement on line 328
    if( sum != lcl_OSProcessorCount ) // check cumulative bit counts matches processor count
    determines inconsistensy;
    then in function static int __internal_daal_queryParseSubIDs(void) in statement on line 1346
    if( glbl_obj.error )
    return -1;
    error is detected which is returned to function static void __internal_daal_buildSystemTopologyTables()
    which leads to exit without initialization
  2. In function static void __internal_daal_buildSystemTopologyTables()
    tables are allocated on line 1683 and should be deallocated in case of an error
    before returns on lines: 1687, 1689, 1694
    function __internal_daal_buildSystemTopologyTables is called from
    __internal_daal_buildSystemTopologyTables which is called from
    __internal_daal_initCpuTopology which is called from
    __internal_daal_GetSysProcessorCoreCount which is called from
    GetL1CacheSize which is called in evere call to compute
0 Kudos
5 Replies
Andrey_G_Intel2
Employee
830 Views

Hello Alexander!
1. now we are thinking about disabling ThreadPinning functionality at Windows. Do you use ThreadPinning?
We fix it, if it will be decided to keep ThreadPinning.
2. glktsn has destructor (glktsn::FreeArrays() in the same source). all allocated tables free in it.
Andrey

0 Kudos
Vasilyev__Alexander
830 Views

​Hello, Andrey!

  1. No, I do not use ThreadPinning. I discovered this effect wondering why the same program works on my laptop with 8 cores and 16 gb RAM, and run out of memory on 40 core workstation with 128gb RAM
  2. I saw this destructor. The problem is that it is never called since after the inconsistency in setChkProcessAffinityConsistency flag init is not set and next call to compute will call getL1CacheSize and getLLCacheSize. Both will call buildSystemTopologyTables and allocate additional memory but will not set init flag in glktsn and next call will allocate additional memory and so on. So destructor is never called. There should be calls to destructor after each check of an error before return without setting init flag but after allocation.
0 Kudos
Andrey_G_Intel2
Employee
830 Views

Hello!
Ok. Now I understand the usage case and see where problem is.
If you use open source DAAL, as quick workaround, I suggest to build libs with define DAAL_CPU_TOPO_DISABLED.
Andrey

0 Kudos
Vasilyev__Alexander
830 Views

Hello, Andrey

Yes, there are two problems:

1. Using macro _x86_64 instead of _M_X64 in service_topo.h (at least on with Windows with MSVC compiler)

2. Incorrect clean up after error.

I have corrected issue #1 and everithing is working (at least for less than 64 cores)

Thanks,

Alexander

0 Kudos
Andrey_G_Intel2
Employee
830 Views

Hello Alexander!
We will fix p1 in upcoming DAAL version.
But I prefer to use _WIN64 macro because it already used in DAAL to determine Windows 64bits.
Andrey
PS:  I have read about _M_X64/_M_AMD64/_WIN64 already :-)

0 Kudos
Reply