- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm getting a crash after a successful graph compilation in mvNCProfile:
```root@myncsdocker:/inout# mvNCProfile -in input_node_2 -on training_2/concat -s 12 -is 512 480 mynetwork.pb
mvNCProfile v02.00, Copyright @ Intel Corporation 2017
shape: [1, 480, 512, 3]
res.shape: (1, 29, 31, 5)
TensorFlow output shape: (29, 31, 5)
/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
Blob generated
USB: Transferring Data…
*** Error in `python3': malloc(): memory corruption: 0x000000000b0ba4d0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f02a363c7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f02a364713e]
/lib/x86_64-linux-gnu/libc.so.6(libc_malloc+0x54)[0x7f02a3649184]
python3(PyObject_Malloc+0x157)[0x5d4597]
python3(PyBytes_FromStringAndSize+0x3f)[0x5c632f]
python3[0x4e9d28]
python3(_PyObject_GenericGetAttrWithDict+0x11d)[0x5941dd]
python3(PyEval_EvalFrameEx+0x44d)[0x536bcd]
python3(PyEval_EvalFrameEx+0x4b14)[0x53b294]
python3(PyEval_EvalFrameEx+0x4b14)[0x53b294]
python3[0x53fc97]
python3(PyEval_EvalFrameEx+0x50bf)[0x53b83f]
python3[0x53fc97]
python3(PyEval_EvalCode+0x1f)[0x5409bf]
python3[0x60cb42]
python3(PyRun_FileExFlags+0x9a)[0x60efea]
python3(PyRun_SimpleFileExFlags+0x1bc)[0x60f7dc]
python3(Py_Main+0x456)[0x640256]
python3(main+0xe1)[0x4d0001]
/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7f02a35e5830]
python3(_start+0x29)[0x5d6999]```
Full log at https://pastebin.com/0PKpSMZ6
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional info: Docker image built from github repo, HEAD -> ncsdk2, tag: v2.08.01.02, origin/ncsdk2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even more info…
It turns out that the error varies seemingly randomly. It switches between:
*** Error in `python3': malloc(): memory corruption: 0x000000000b0ba4d0 ***
*** Error in `python3': double free or corruption (!prev): 0x000000000287e410 ***
*** Error in `python3': free(): invalid next size (normal): 0x0000000008dc9880 ***
So, a general memory overwrite problem.
Any hints and tips for how I can solve this problem? Can I enable more verbose debug output?
// Karl-Anders
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tracked it down to the line
myriad_output, userobj = fifoOut.read_elem()
in Controllers/MiscIO.py, it this helps shed any light on what is happening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to keep spamming, but I now have a "final" observation about this:
Tracing further into read_elem, the first thing read is the "elementsize", and the elementsize.value equals 1798, and that is the size it allocates the tensor "string" with.
Now, 1798 does _not_ match the expected size of the output tensor:
...
res.shape: (1, 29, 31, 5)
TensorFlow output shape: (29, 31, 5)
...
1798 happens to be 29_31_sizeof(FP16), so the dimension of size 5 has been lost somewhere.
@Tome_at_Intel, perhaps you can point me in the right direction?
Thanks!
// Karl-Anders
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
@Tome_at_Intel, sorry to reference you directly, but perhaps you can point me in the right direction?
mvNCProfile says:
...
shape: [1, 480, 512, 3]
res.shape: (1, 29, 31, 5)
TensorFlow output shape: (29, 31, 5)
...
…which is the expected output size, but later, after the
USB: Transferring Data...
printout I added some debug prints in the code talking to the hardware, and the line myriad_output, userobj = fifoOut.read_elem()
returns the wrong number of elements, i.e. 1798 bytes. That corresponds to 29 x 31 x sizeof(FP16), so the output has gone from 5-value per "pixel" to 1 value.
Is there any way I can reverse engineer the graph file itself to figure out it it's _that_ one that misbehaves?
Cheers, and thanks in advance!
// Karl-Anders
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page