When training Tensorflow models I am getting below error, which I was not getting earlier. Due to which I am unable to train models.
tensorflow-2.3.0 devcloud=latest -- /opt/intel/openvino_2020.3.194/ Python 3.6.10
2020-10-12 22:54:10.305177: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556b4a3c7120 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-12 22:54:10.305233: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-10-12 22:54:10.317680: F tensorflow/core/platform/default/env.cc:72] Check failed: ret == 0 (11 vs. 0)Thread creation via pthread_create() failed. Aborted
Their is a similar bug on Tensorflow support https://github.com/tensorflow/tensorflow/issues/41532 but the root cause seems to be the server on which Tensorflow is running.
Also based on my debugging the error seems to be related to this server current state
I believe their might be some restriction set per user on intel dev cloud which is causing this error.
Their are some limits which I can see by running normal commands like
Check max number of threads: $ ulimit -u 1024 #Check limits for all ressources: $ ulimit -a (cenv) u47404@s099-n010:~/intelmac$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 6291456 scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1026157 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 6291456 open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 12288 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Could you please confirm the following things:
As mentioned earlier, this forum is intended to handle only oneAPI devcloud issues. So, if you have any issues related to Devcloud for edge, could you please post your query in Devcloud for edge forum. Link:- https://community.intel.com/t5/Intel-DevCloud-for-Edge/bd-p/devcloud-edge . Also, Devcloud for edge is mainly designed for doing inference not for training workloads as mentioned above.
We haven't heard back anything from you. We won't be monitoring this thread anymore. Please raise an new thread if you have any further issues.