Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Valgrind reports an Invalid Read in __intel_new_memcpy

Sujith_G_1
Beginner
1,864 Views

 

Valgrind tool which is a well known memory analyzing tool reports an Invalid Read in OCIStmtPrepare in Oracle C API Function. This can be observed in several such Oracle C API functions.

Please refer the following stack trace.

According to my observations and understanding the the application creates a buffer of 317 bytes. However when it is passed to Oracle library it does some memory copy using the __intel_new_memcpy function. However the __intel_new_memcpy function copies 320 bytes (which is 8 from 312). The actual allocated memory was 317 bytes.

Could you please confirm whether this behaviour correct? What goes wrong in this?

==22195== Invalid read of size 8
==22195== at 0x68CD2D9: __intel_new_memcpy (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D84158: kpurclientparse (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D878DE: kpureq (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D607FA: OCIStmtPrepare (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x4099E0: DBCursor::Parse(char const*) (OCICPP.C:1020)
==22195== by 0x40CE29: DBCon::NewCursor(char const*, int) (OCICPP.C:753)
==22195== by 0x4047A6: main (main.cpp:59)
==22195== Address 0xa2e7e68 is 312 bytes inside a block of size 317 alloc'd
==22195== at 0x4C26E1C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==22195== by 0x4EBD00F: String::Set(char const*, unsigned int) (String.cpp:544)
==22195== by 0x4EBD169: String::Set(char const*) (String.cpp:512)
==22195== by 0x4EBD188: String::operator=(char const*) (String.cpp:590)
==22195== by 0x404784: main (main.cpp:55)

0 Kudos
17 Replies
Judith_W_Intel
Employee
1,864 Views

 

This looks similar to this report:

https://sft.its.cern.ch/jira/browse/CORALCOOL-1191

I can't tell why that report was closed.

 

0 Kudos
Sujith_G_1
Beginner
1,864 Views

According to the jira link in issue https://software.intel.com/en-us/forums/intel-c-compiler/topic/698479#comment-1886701, this had been reported as a Oracle SR and has been closed as no issue found.

I also have seen that Oracle SR, However that has been closed without the explanation saying there is no issue as Valgrind does not know the advance optimization techniques done in Oracle.

Can someone explain the actual implementation of __intel_new_memcpy which will be benifitted to get this resolved or ignore.

 

0 Kudos
Sujith_G_1
Beginner
1,864 Views

According to that Oracle SR,

********************************************************************************************************************************************************************

If you call malloc 317 bytes, it should actually allocate same number of bytes.
Since our memory management routines are very complex and are internal Valgrind tool is not able to determine what goes on.

You can safely ignore this warning, if you are not facing any error or memory leak.
If any error or leak, please upload the test code so that i can reproduce the issue and file a new bug.

********************************************************************************************************************************************************************

Actually this is not a memory leaks. The OCIStmtPrepare function tries to read memory beyond allocated. The buffer of length 317 is allocated by our application and it is passed to OCIStmtPrepare function with the length as the same. The __intel_new_memcpy function tries to read 8 bytes from 312th byte, that it reads 320. Could you please confirm whether the __intel_new_memcpy functrions copying behavior is correct? Does it track the actual size allocated by the OS memory management system. Even if we call malloc 317 bytes, does it actually allocate 320 and __intel_new_memcpy reads the same length (320). We can simply ignore the given Valgrind "Invalid Read", if it is the actual implementation. Please confirm.

********************************************************************************************************************************************************************

If there are no other symptoms these warnings can be safely ignored, and the valgrind documentation shows how this can be done automatically so that only messages pertaining to your own code will be displayed. 
Valgrind is not able to determine what goes on in optimised code, and our memory management routines are complex.

If you believe you have encountered a memory leak or other error, a valgrind report on its own is not sufficient to raise a bug. 
We will need a reproducing testcase that demonstrates the problem that valgrind is claiming will happen.

For a supposed memory leak this is fairly easy. Just identify the statement that valgrind says will leak memory, then call that statement repeatedly in an infinite loop, together with appropriate cleanup code. 
For example if you are testing a connect statement, you must include in the loop a disconnect, or if you are testing createEnvironment then you must also include terminateEnvironment. 
To demonstrate a memory leak this must show unbounded growth of memory usage until the process crashes due to a lack of memory.

Reference:
Note.1300407.1 Valgrind Throws Lots Of Errors For The OCCI Library

********************************************************************************************************************************************************************

0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
You need to investigate if that memory block was allocated by _mm_malloc, or similar CRT-fucntions. These functions have a 2nd argument for memory alignment. >>... The actual allocated memory was 317 bytes. If 317 bytes of the memory block are requested from the Heap, and a pointer to that memory was aligned on an 8-byte boundary, than at least 320 bytes could be actually allocated.to satisfy the boundary condition. This is because 320 mod 8 = 0. I suspect that a memory block was Not allocated by _mm_malloc, or similar CRT-fucntions. I also see that line: >>... >>...==22195== at 0x4C26E1C: operator new[](unsigned long) (vg_replace_malloc.c:305) >>... and it is Not clear for me if that memory block was used by Intel's __intel_new_memcpy function.
0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
>>...If you believe you have encountered a memory leak... I don't think this is a memory leaks. You could create a simple test that allocates-deallocates some memory block, for example, 1,000,000 times. When testing use a Resource Monitor to verify if a total amount of memory allocated by that application slowly grows. But, if it grows, it does Not mean (!) that this is a memory leaks related to what you've detected because it could be a memory leaks in another internal function which is a part of OCI API domain.
0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
>>...OCIStmtPrepare in Oracle C API Function... Also, having worked with Oracle Call Interface ( OCI ) API in the past I could tell that this is a very reliable API.
0 Kudos
Sujith_G_1
Beginner
1,864 Views

Thanks Sergey for clarification.

Please consider that I do not have any issue with the stability of the OCI API. I'm just trying to what it going when my application which uses OCI runs with valgrind.

My understanding is,

The memory was allocated by the default malloc function (new) in default gnu c++. However when it runs with Valgrind it traps the malloc into a function inside Valgrind and just get the length recorded inside Valgrind to be checked once it is read by any function.

I'm worried that can these buffers which are allocated by non intel function passed into the function __intel_new_memcpy which considers the memory was allocated by _mm_malloc?

 

0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
>>I'm worried that can these buffers which are allocated by non intel function passed into the function __intel_new_memcpy which >>considers the memory was allocated by _mm_malloc? It is possible but I do Not know if it really happens in your case. There should be an Access Violation if non-aligned memory operations are done in a function, like __intel_new_memcpy, that expects aligned memory blocks. So, if there is No an Access Violation than processing is correct. I would trust Oracle but it always a good decision to verify what is going on inside of some API.
0 Kudos
TimP
Honored Contributor III
1,864 Views

This memcpy function surely would take advantage of aligned memory, but should not require it.  An aligned malloc would take additional bytes as required to get alignment, but a proper matching of malloc and free functions would free those extra unused bytes.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,864 Views

Rambling thoughts:

Can you look at the assembly code to see if the compiler generated a non-masked load, followed by a masked store. Note, you may have several loads into different ymm/zmm registers, followed by several stores (last in sequence being masked).

Note, an aligned load cannot cross a page boundary, and thus would be safe. The masked aligned store would protect the memory following the allocated node. The use of the aligned load without mask (and past end of data) may be an optimization.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
>>...An aligned malloc would take additional bytes as required to get alignment... This is how it looks like in C codes: ... { ... pvUnaligned = ( RTvoid * )malloc( uiSize + uiAlign ); if( pvUnaligned == RTnull ) return ( RTvoid * )RTnull; pvAligned = ( RTvoid * )( ( ( RTusize_t )pvUnaligned + uiAlign ) & ~( ( RTusize_t )( uiAlign ) - 1 ) ); ( ( RTvoid ** )pvAligned )[-1] = pvUnaligned; return ( RTvoid * )pvAligned; } ...
0 Kudos
Sujith_G_1
Beginner
1,864 Views

Thank you all for clarifications.

I was able to re-create is even with Oracle 12c libraries.

==26333== Invalid read of size 16

==26333==    at 0x75FA410: __intel_ssse3_rep_memcpy (in /x02/app/oracle/product/12.1.0.2/client_1/lib/libclntsh.so.12.1)
==26333==    by 0x75F3F25: _intel_fast_memcpy.P (in /x02/app/oracle/product/12.1.0.2/client_1/lib/libclntsh.so.12.1)
==26333==    by 0x69FD17C: kpurclientparse (in /x02/app/oracle/product/12.1.0.2/client_1/lib/libclntsh.so.12.1)
==26333==    by 0x69FE9DE: kpureq (in /x02/app/oracle/product/12.1.0.2/client_1/lib/libclntsh.so.12.1)
==26333==    by 0x69D59CE: OCIStmtPrepare (in /x02/app/oracle/product/12.1.0.2/client_1/lib/libclntsh.so.12.1)
==26333==    by 0x6273F63: soci::oracle_statement_backend::prepare(std::string const&, soci::details::statement_type) (statement.cpp:65)
==26333==    by 0x5D8877: prepare (statement.h:163)
==26333==    by 0x52B8ED: Loadup::Load() (Loadup.cpp:92)
==26333==    by 0xB9027B5: start_thread (in /lib64/libpthread-2.11.3.so)

==26333==  Address 0xfe655b0 is 224 bytes inside a block of size 237 alloc'd
==26333==    at 0x4C2936F: operator new(unsigned long) (vg_replace_malloc.c:324)
==26333==    by 0xAEA90C8: allocate (new_allocator.h:104)
==26333==    by 0xAEA90C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (basic_string.tcc:607)
==26333==    by 0x57DB574: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /x04/exreg/libs/libboost_filesystem.so.1.59.0)
==26333==    by 0xAEAAE75: _S_construct_aux<char const*> (basic_string.h:1743)
==26333==    by 0xAEAAE75: _S_construct<char const*> (basic_string.h:1764)
==26333==    by 0xAEAAE75: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (basic_string.tcc:215)
==26333==    by 0x52B8ED: Loadup::Load() (Loadup.cpp:90)
==26333==    by 0xB9027B5: start_thread (in /lib64/libpthread-2.11.3.so)

 

Hi Sergey,

Is this the code in ICC's __intel_new_memcpy? Or any other compiler?

 

0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
That new output is very different from the 1st one and it starts with: ... ==26333== Invalid read of size 16 ... but before it was" ... ==22195== Invalid read of size 8 ... I also see that __intel_ssse3_rep_memcpy and _intel_fast_memcpy are used now. What is your concern now?
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,864 Views

Sergey,

Can you examine the assembly code to see if the corresponding write of the memcpy extends beyond the allocation?

Note, if the generated code assures that the last vector to be read is short .AND. will not span a page boundary, then it is (CPU-wise) benign to read beyond allocation .PROVIDED. the write in the memcpy does not also go beyond the allocated memory (length of source). IOW it is safe to perform a load pd followed (potentially later) by store sd.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
1,864 Views
>>Sergey, >> >>Can you examine the assembly code to see if the corresponding write of the memcpy extends beyond the allocation? You need to address it to the author of the thread.
0 Kudos
Sujith_G_1
Beginner
1,864 Views

Please consider that these corresponding Intel functions (__intel_ssse3_rep_memcpy or _intel_fast_memcpy ) were called by the Oracle Client Libraries and my code does not directly interact with the ICC. Therefore I reported this as an Oracle SR, but their response was to ignore this saying that the Oracle’s memory management is complex and cannot examine in Valgrind. I was in an expression that some Intel black belt engineer may know the actual implementation of the _intel_fast_memcpy and answer. That is why this was logged in this forum.

Ok. However even I can examine the assembly code of Oracle C library but will be extremely harder. I’ll try and get back.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,864 Views

>> However even I can examine the assembly code of Oracle C library but will be extremely harder.

This is relatively easy. Start the program with the debugger (step into), but do not run it. instead, (assuming you ran the same program that failed above), examine (disassemble) the instructions around the failing location (at 0x75FA410). It would be best to start several instructions earlier (lower address), and follow through later. You should see a sequence of instructions moving memory to a group of SSE (xmm) or AVX (ymm) registers, followed by a series of moves from those registers back to memory followed by an add or sub from a register and a branch back.

Jim Dempsey

0 Kudos
Reply