- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When training Tensorflow models I am getting below error, which I was not getting earlier. Due to which I am unable to train models.
tensorflow-2.3.0
devcloud=latest -- /opt/intel/openvino_2020.3.194/
Python 3.6.10
2020-10-12 22:54:10.305177: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556b4a3c7120 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-12 22:54:10.305233: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-10-12 22:54:10.317680: F tensorflow/core/platform/default/env.cc:72] Check failed: ret == 0 (11 vs. 0)Thread creation via pthread_create() failed.
Aborted
Their is a similar bug on Tensorflow support https://github.com/tensorflow/tensorflow/issues/41532 but the root cause seems to be the server on which Tensorflow is running.
Also based on my debugging the error seems to be related to this server current state
- As earlier same code was running fine and getting executed
- Also when this server got rebooted earlier then the error was not coming and after some time it again surfaced.
I believe their might be some restriction set per user on intel dev cloud which is causing this error.
Their are some limits which I can see by running normal commands like
Check max number of threads:
$ ulimit -u
1024
#Check limits for all ressources:
$ ulimit -a
(cenv) u47404@s099-n010:~/intelmac$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) 6291456
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1026157
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 6291456
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 12288
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
- Tags:
- Tensorflow
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums.
Could you let us know which devcloud(oneapi devcloud/devcloud for the edge) you are using?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Above i have included the version
devcloud= /opt/intel/openvino_2020.3.194/
Is their any other way to tell the dev cloud version ?
The url is
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the update.
From the link attached we found that you are using devcloud for the edge.Hence we are forwarding the case to Devcloud for the edge forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for sharing the error log. Intel Devcloud for the edge is not designed for training. It has compute nodes to edge inference.
Regards,
Alaa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pankaj,
Could you please confirm the following things:
- Are you still facing this issue?
- Are you using oneAPI devcloud or Devcloud for edge as working environment?
As mentioned earlier, this forum is intended to handle only oneAPI devcloud issues. So, if you have any issues related to Devcloud for edge, could you please post your query in Devcloud for edge forum. Link:- https://community.intel.com/t5/Intel-DevCloud-for-Edge/bd-p/devcloud-edge . Also, Devcloud for edge is mainly designed for doing inference not for training workloads as mentioned above.
Regards,
Chithra J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pankaj,
Could you please give us an update on this?
Regards,
Chithra
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pankaj,
We haven't heard back anything from you. We won't be monitoring this thread anymore. Please raise an new thread if you have any further issues.
Regards,
Chithra
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page