Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6503 Discussions

Crash at USB transfer in NCSDK2 mvNCProfile

idata
Employee
838 Views

Hi,

 

I'm getting a crash after a successful graph compilation in mvNCProfile:

 

```root@myncsdocker:/inout# mvNCProfile -in input_node_2 -on training_2/concat -s 12 -is 512 480 mynetwork.pb

 

mvNCProfile v02.00, Copyright @ Intel Corporation 2017

 

shape: [1, 480, 512, 3]

 

res.shape: (1, 29, 31, 5)

 

TensorFlow output shape: (29, 31, 5)

 

/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance

 

Blob generated

 

USB: Transferring Data…

 

*** Error in `python3': malloc(): memory corruption: 0x000000000b0ba4d0 ***

 

======= Backtrace: =========

 

/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f02a363c7e5]

 

/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f02a364713e]

 

/lib/x86_64-linux-gnu/libc.so.6(libc_malloc+0x54)[0x7f02a3649184]

 

python3(PyObject_Malloc+0x157)[0x5d4597]

 

python3(PyBytes_FromStringAndSize+0x3f)[0x5c632f]

 

python3[0x4e9d28]

 

python3(_PyObject_GenericGetAttrWithDict+0x11d)[0x5941dd]

 

python3(PyEval_EvalFrameEx+0x44d)[0x536bcd]

 

python3(PyEval_EvalFrameEx+0x4b14)[0x53b294]

 

python3(PyEval_EvalFrameEx+0x4b14)[0x53b294]

 

python3[0x53fc97]

 

python3(PyEval_EvalFrameEx+0x50bf)[0x53b83f]

 

python3[0x53fc97]

 

python3(PyEval_EvalCode+0x1f)[0x5409bf]

 

python3[0x60cb42]

 

python3(PyRun_FileExFlags+0x9a)[0x60efea]

 

python3(PyRun_SimpleFileExFlags+0x1bc)[0x60f7dc]

 

python3(Py_Main+0x456)[0x640256]

 

python3(main+0xe1)[0x4d0001]

 

/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7f02a35e5830]

 

python3(_start+0x29)[0x5d6999]```

 

Full log at https://pastebin.com/0PKpSMZ6

0 Kudos
5 Replies
idata
Employee
569 Views

Additional info: Docker image built from github repo, HEAD -> ncsdk2, tag: v2.08.01.02, origin/ncsdk2.

0 Kudos
idata
Employee
569 Views

Even more info…

 

It turns out that the error varies seemingly randomly. It switches between:

 

*** Error in `python3': malloc(): memory corruption: 0x000000000b0ba4d0 *** *** Error in `python3': double free or corruption (!prev): 0x000000000287e410 *** *** Error in `python3': free(): invalid next size (normal): 0x0000000008dc9880 ***

 

So, a general memory overwrite problem.

 

Any hints and tips for how I can solve this problem? Can I enable more verbose debug output?

 

// Karl-Anders

0 Kudos
idata
Employee
569 Views

I tracked it down to the line

 

myriad_output, userobj = fifoOut.read_elem()

 

in Controllers/MiscIO.py, it this helps shed any light on what is happening.

0 Kudos
idata
Employee
569 Views

Sorry to keep spamming, but I now have a "final" observation about this:

 

Tracing further into read_elem, the first thing read is the "elementsize", and the elementsize.value equals 1798, and that is the size it allocates the tensor "string" with.

 

Now, 1798 does _not_ match the expected size of the output tensor:

 

... res.shape: (1, 29, 31, 5) TensorFlow output shape: (29, 31, 5) ...

 

1798 happens to be 29_31_sizeof(FP16), so the dimension of size 5 has been lost somewhere.

 

@Tome_at_Intel, perhaps you can point me in the right direction?

 

Thanks!

 

// Karl-Anders
0 Kudos
idata
Employee
569 Views

Hi!

 

@Tome_at_Intel, sorry to reference you directly, but perhaps you can point me in the right direction?

 

mvNCProfile says:

 

... shape: [1, 480, 512, 3] res.shape: (1, 29, 31, 5) TensorFlow output shape: (29, 31, 5) ...

 

…which is the expected output size, but later, after the USB: Transferring Data... printout I added some debug prints in the code talking to the hardware, and the line myriad_output, userobj = fifoOut.read_elem() returns the wrong number of elements, i.e. 1798 bytes. That corresponds to 29 x 31 x sizeof(FP16), so the output has gone from 5-value per "pixel" to 1 value.

 

Is there any way I can reverse engineer the graph file itself to figure out it it's _that_ one that misbehaves?

 

Cheers, and thanks in advance!

 

// Karl-Anders
0 Kudos
Reply