Solved: Re: Openvino 2021.4 libMKLDNNPlugin.so segfault on 6th gen Core i7

TendieMonster · ‎05-08-2022

Afer upgrading to openvino 2021.4, an inference test that used to pass running on CPU now fails with a segfault (backtrace below). The test passes if I run it using the GPU plugin. Did Intel drop support for 6th gen corei7's?

CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007f49c233207e in ?? ()
(gdb) bt
#0  0x00007f49c233207e in ?? ()
#1  0x00007f49d0295500 in ?? ()
#2  0x00007f49d02792e0 in ?? ()
#3  0x00007fff801d8c70 in ?? ()
#4  0x00007f49d0295500 in ?? ()
#5  0x0000000000000004 in ?? ()
#6  0x0000000000000001 in ?? ()
#7  0x00007f4a0e980be3 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#8  0x00007f4a0e9814ab in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#9  0x00007f4a51bc8119 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/external/tbb/lib/libtbb.so.2
#10 0x00007f4a51bc573c in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/external/tbb/lib/libtbb.so.2
--Type <RET> for more, q to quit, c to continue without paging--
#11 0x00007f4a0e9830c7 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#12 0x00007f4a0eb4a02d in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#13 0x00007f4a0f01a19b in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#14 0x00007f4a0f01d9e2 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#15 0x00007f4a0f42fa23 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#16 0x00007f4a0f42fbd3 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#17 0x00007f4a0f4571b4 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#18 0x00007f4a0f435920 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#19 0x00007f4a0f492882 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#20 0x00007f4a51bc2ac2 in tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const ()
   from /opt/intel/openvino_2021/deployment_tools/inference_engine/external/tbb/lib/libtbb.so.2
--Type <RET> for more, q to quit, c to continue without paging--
#21 0x00007f4a573bdf60 in InferenceEngine::CPUStreamsExecutor::Execute(std::function<void ()>) ()
   from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine.so
#22 0x00007f4a0f469dc1 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#23 0x00007f4a0f492d27 in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#24 0x00007f4a0f492fca in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#25 0x00007f4a0f4933bb in ?? () from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so
#26 0x00007f4a57404452 in InferenceEngine::InferRequest::Infer() ()
   from /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine.so

TendieMonster · ‎05-23-2022

Posting solution in case others stumble upon this.

This may qualify as user error, our code misused the InferenceEngine::Core::ReadNetwork method call by essentially supplying a dangling Blob::CPtr pointer to weights data.

The root cause is more clear when the test is run under gdb and one looks at stack traces for all threads.

gdb> thread apply all bt

One can see that the segfault happens while performing inference and accessing weights data. It seems that in 2021.4 the Blob::CPtr returned by make_shared_blob holds a raw pointer rather than copying the data pointed to by std::shared_ptr< const Blob >.

We had something similar to the snippet below, which was fine in 2021.1 release, but now crashes.


        std::vector<uint8_t> weights; // <--RESULTS IN DANGLING POINTER FOR 2021.4
        weights = loadWeights();
        InferenceEngine::Core core;

        auto weightsBlob = InferenceEngine::make_shared_blob<uint8_t>( 
                     {InferenceEngine::Precision::U8,
                      {weights.size()},
                      InferenceEngine::Layout::C},
                     weights.data());

        InferenceEngine::CNNNetwork network =
            core.ReadNetwork(model, weightsBlob);

Making the weights vector static (show below) is a quick fix, this change in behavior is in the documentation for Core::ReadNetwork

Created InferenceEngine::CNNNetwork object shares the weights with weights object. So, do not create weights on temporary data which can be later freed, since the network constant data becomes to point to invalid memory.


        static std::vector<uint8_t> weights; // <--WEIGHTS DATA REMAINS VALID
        weights = loadWeights();
        InferenceEngine::Core core;

        auto weightsBlob = InferenceEngine::make_shared_blob<uint8_t>( 
                     {InferenceEngine::Precision::U8,
                      {weights.size()},
                      InferenceEngine::Layout::C},
                     weights.data());

        InferenceEngine::CNNNetwork network =
            core.ReadNetwork(model, weightsBlob);

It also explains why the GPU plugin was fine, ostensibly the weights data has to be copied into GPU memory, so having the vector go out of scope and its data invalidated causes no problems.

Hope this saves somebody some time!

View solution in original post

Peh_Intel · ‎05-10-2022

Hi TendieMonster,

Thanks for reaching out to us.

For your information, Intel® Distribution of OpenVINO™ toolkit 2021.4 supports 6th to 12th generation Intel® Core™ processors.

You can run Hello Query Device Python Sample to check whether your CPU is one of the available Inference Engine devices.

Regards,

Peh

TendieMonster · ‎05-10-2022

Thanks for responding! Since the CPU is supported I would expect both CPU and GPU to return the same results; however, given that inference on the CPU crashes while inference on the GPU does not, I suspected that maybe the GPU was able to run openvino models by chance (rather than by design).

I also have an Intel(R) Atom(TM) Processor E3950 @ 1.60GHz device where I'm able to run inference on CPU and GPU without crashing, but the inference results are different.

I'm at a loss in terms of understanding why any release newer than 2021.1 breaks all the products I work on, should I just wait for an openvino release that doesn't break? I tried using a newer versions of model optimiser but still end up with crashes and incorrect inference results.

Am stuck at 2021.1 for now, but would really like to understand why there's breakage, if you could point me at some documentation or troubleshooting tips in terms of breaking changes I would really appreciate it.

Peh_Intel · ‎05-11-2022

Hi TendieMonster,

You can check out the Release Notes for Intel® Distribution of OpenVINO™ toolkit:

New and Changed in the Release 2

New and Changed in the Release 3

New and Changed in 2021.4.1 LTS

New and Changed in 2021.4.2 LTS

Besides, also try out with the latest OpenVINO™ version, 2022.1. From the 2022.1 release, the OpenVINO™ Development Tools can only be installed via PyPI.

Regards,

Peh

Peh_Intel · ‎05-19-2022

Hi TendieMonster,

This thread will no longer be monitored since we have provided answers. If you need any additional information from Intel, please submit a new question.

Regards,

Peh

TendieMonster · ‎05-23-2022

Posting solution in case others stumble upon this.

This may qualify as user error, our code misused the InferenceEngine::Core::ReadNetwork method call by essentially supplying a dangling Blob::CPtr pointer to weights data.

The root cause is more clear when the test is run under gdb and one looks at stack traces for all threads.

gdb> thread apply all bt

One can see that the segfault happens while performing inference and accessing weights data. It seems that in 2021.4 the Blob::CPtr returned by make_shared_blob holds a raw pointer rather than copying the data pointed to by std::shared_ptr< const Blob >.

We had something similar to the snippet below, which was fine in 2021.1 release, but now crashes.


        std::vector<uint8_t> weights; // <--RESULTS IN DANGLING POINTER FOR 2021.4
        weights = loadWeights();
        InferenceEngine::Core core;

        auto weightsBlob = InferenceEngine::make_shared_blob<uint8_t>( 
                     {InferenceEngine::Precision::U8,
                      {weights.size()},
                      InferenceEngine::Layout::C},
                     weights.data());

        InferenceEngine::CNNNetwork network =
            core.ReadNetwork(model, weightsBlob);

Making the weights vector static (show below) is a quick fix, this change in behavior is in the documentation for Core::ReadNetwork

Created InferenceEngine::CNNNetwork object shares the weights with weights object. So, do not create weights on temporary data which can be later freed, since the network constant data becomes to point to invalid memory.


        static std::vector<uint8_t> weights; // <--WEIGHTS DATA REMAINS VALID
        weights = loadWeights();
        InferenceEngine::Core core;

        auto weightsBlob = InferenceEngine::make_shared_blob<uint8_t>( 
                     {InferenceEngine::Precision::U8,
                      {weights.size()},
                      InferenceEngine::Layout::C},
                     weights.data());

        InferenceEngine::CNNNetwork network =
            core.ReadNetwork(model, weightsBlob);

It also explains why the GPU plugin was fine, ostensibly the weights data has to be copied into GPU memory, so having the vector go out of scope and its data invalidated causes no problems.

Hope this saves somebody some time!