Community
cancel
Showing results for 
Search instead for 
Did you mean: 
r_0_h_1_t
Beginner
175 Views

NVIDIA driver issue while training keras model using compute server

After submitting the job to the server, I get this error. However, the code gets executed. Check the error log here

2020-04-18 03:46:41.565872: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/:/glob/development-tools/versions/oneapi/beta05/inteloneapi/vpl/latest/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/tbb/2021.1-beta05/env/../lib/intel64/gcc4.8:/glob/development-tools/versions/oneapi/beta05/inteloneapi/oneDNN/2021.1-beta05/cpu_dpcpp_gpu_dpcpp/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//libfabric/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib/release:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mkl/2021.1-beta05/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/itac/2021.1-beta05/slib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/../../compiler/latest/linux/compiler/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/dep/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/libipt/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/gdb/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/daal/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib/x64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/host/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ccl/2021.1-beta05/lib/cpu_gpu_dpcpp:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/libfabric/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib/release:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/ipp/lib/intel64:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/debugger_2019/libipt/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/../tbb/lib/intel64_lin/gcc4.4
2020-04-18 03:46:41.571669: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/:/glob/development-tools/versions/oneapi/beta05/inteloneapi/vpl/latest/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/tbb/2021.1-beta05/env/../lib/intel64/gcc4.8:/glob/development-tools/versions/oneapi/beta05/inteloneapi/oneDNN/2021.1-beta05/cpu_dpcpp_gpu_dpcpp/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//libfabric/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib/release:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mkl/2021.1-beta05/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/itac/2021.1-beta05/slib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/../../compiler/latest/linux/compiler/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/dep/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/libipt/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/gdb/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/daal/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib/x64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/host/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ccl/2021.1-beta05/lib/cpu_gpu_dpcpp:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/libfabric/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib/release:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/ipp/lib/intel64:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/debugger_2019/libipt/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/../tbb/lib/intel64_lin/gcc4.4
2020-04-18 03:46:41.571716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-04-18 03:46:47.323803: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-18 03:46:47.356684: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3400000000 Hz
2020-04-18 03:46:47.359060: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e7f77ea130 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-18 03:46:47.359120: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-18 03:46:47.369479: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/:/glob/development-tools/versions/oneapi/beta05/inteloneapi/vpl/latest/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/tbb/2021.1-beta05/env/../lib/intel64/gcc4.8:/glob/development-tools/versions/oneapi/beta05/inteloneapi/oneDNN/2021.1-beta05/cpu_dpcpp_gpu_dpcpp/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//libfabric/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib/release:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mpi/2021.1-beta05//lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/mkl/2021.1-beta05/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/itac/2021.1-beta05/slib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ipp/latest/../../compiler/latest/linux/compiler/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/dep/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/libipt/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/debugger/8.3-beta05/gdb/intel64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/daal/latest/lib/intel64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/lib/x64:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/host/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/2021.1-beta05/linux/lib/oclfpga/linux64/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/oneapi/beta05/inteloneapi/compiler/latest/linux/compiler/lib:/glob/development-tools/versions/oneapi/beta05/inteloneapi/ccl/2021.1-beta05/lib/cpu_gpu_dpcpp:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/libfabric/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib/release:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/ipp/lib/intel64:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/tbb/lib/intel64/gcc4.7:/glob/development-tools/versions/intel-parallel-studio/debugger_2019/libipt/intel64/lib:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/lib/intel64_lin:/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/daal/../tbb/lib/intel64_lin/gcc4.4
2020-04-18 03:46:47.369519: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-18 03:46:47.369564: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (s001-n048): /proc/driver/nvidia/version does not exist

 

What I understood is that NVIDIA driver is missing. But I don't know whether the compute server is having NVIDIA or not. I don't see any NVIDIA compute servers info. If NVIDIA exists, help me how to use it. 

Tags (1)
0 Kudos
4 Replies
ArunJ_Intel
Moderator
175 Views

Hey Rohit,

 

Devcloud compute nodes does not have NVIDIA drivers. You could try changing your tensorflow code to use CPU instead.

 

Thanks

Arun Jose

 

ArunJ_Intel
Moderator
175 Views

Hey Rohit,

 

Hope that answers your query. Could you please confirm if we could go ahead and close this case.

 

Arun Jose

r_0_h_1_t
Beginner
175 Views

Yeah. You can close the issue. Thanks.

ArunJ_Intel
Moderator
175 Views

Thanks Rohit for the confirmation. We  are closing this case please feel free to raise a new thread in case of further issues

 

 

Reply