Intel® oneAPI DL Framework Developer Toolkit
Gain insights from peers and Intel experts to develop new deep learning frameworks or to customize an framework utilizing common APIs.

TensorFlow & Horovod for distributed training

dbrayford
Beginner
2,506 Views

How do I get horovod to install correctly in the container for distributed training?

 

I installed horovod with the command:

$ pip install horovod

I then did the following

Python 

import pyarrow.tensorflow as tf

import horovod.tensorflow as hvd

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/intel/oneapi/intelpython/python3.7/lib/python3.7/site-packages/horovod/tensorflow/__init__.py", line 24, in <module>
check_extension('horovod.tensorflow', 'HOROVOD_WITH_TENSORFLOW', __file__, 'mpi_lib')
File "/opt/intel/oneapi/intelpython/python3.7/lib/python3.7/site-packages/horovod/common/util.py", line 56, in check_extension
ext_name, full_path, ext_env_var
ImportError: Extension horovod.tensorflow has not been built: /opt/intel/oneapi/intelpython/python3.7/lib/python3.7/site-packages/horovod/tensorflow/mpi_lib.cpython-37m-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_TENSORFLOW=1 to debug the build error.

I reinstalled horovod with HOROVOD_WITH_TENSORFLOW=1 but still got the same error message.

It would be great if you could provide  horovod built with Intel MPI pre-installed in the container.

 

David

Labels (1)
0 Kudos
8 Replies
ArunJ_Intel
Moderator
2,496 Views

Hi dbrayford,


Could you let us know the following details. 


1)Which is the container you are using are you building your custom docker image?

2)Which is the version of tensorflow you are using?

3)Are you using intel distributions of tensorflow as well as python.


Thanks

Arun


0 Kudos
dbrayford
Beginner
2,491 Views

1) I am using the Dockerfile from https://github.com/intel/oneapi-containers/tree/master/images/docker/dlfdkit-devel-ubuntu18.04

 

2) I was using the version in the container pyarrow.tensorflow I would like to use a version 1.x, but also interested in TensorFlow 2.x.

 

3) I am using the version of python in /opt/intel/oneapi/intelpython/python3.7 in the container  and tensorflow from /opt/intel/oneapi/intelpython/python3.7/pkgs/ (I assume this is a version Intel TensorFlow?)

Would it be possible to update the oneAPI DL Dockerfile in the github repository to include Intel TensorFlow?

 

David

I used the python command import pyarrow.tensorflow

0 Kudos
ArunJ_Intel
Moderator
2,484 Views

Hi dbrayford


Regarding your query to include Intel TensorFlow in oneAPI DL Dockerfile, Intel OneAPI has a different docker image ie intel-ai-analytics-toolkit docker file(intel/oneapi-aikit), which has intel-optimised tensorflow pre-installed. PFB the link to steps for running this container.


https://github.com/intel/oneapi-containers#intel-ai-analytics-toolkit


However using horovod with inel-MPI does have issues as of now , we will be checking with SME regarding solving these. Will get back to you shortly with a response.


Thanks

Arun Jose


0 Kudos
ArunJ_Intel
Moderator
2,458 Views

Hi dbrayford,


We were able to find the root cause for the error. The error was caused as horovod tried to build with CCL under tho hood, and the oneapi version of CCL is not supported in horovod.

We were able to buld horovod with intelMPI outside docker container. We are looking into make it work with oneAPI docker images. Will keep you posted with further updates.


Arun Jose


0 Kudos
ArunJ_Intel
Moderator
2,440 Views

Hi David,


We are forwarding your case to Subject Matter Experts. They will get back to you regarding the query.



Thanks

Arun


0 Kudos
ArunJ_Intel
Moderator
2,105 Views

Hi dbrayford,


Please find instructions to use Intel® Optimizations for TensorFlow* with Open MPI* and Horovod with prebuilt container from intel by following the instructions in the below link.


https://software.intel.com/content/www/us/en/develop/articles/containers/dl-optimizations-for-tensorflow-with-open-mpi-and-horovod.html


With this you can use horovod with intel tensorflow without going through the hassle of fixing the installation issues.

You could also search for optimized containers and solutions from Intel from the intel oneContainer Portal. Get production-quality Docker* containers designed to meet your specific needs for HPC, AI, machine learning, IoT, media, rendering, and more. 


https://software.intel.com/content/www/us/en/develop/tools/containers/full-catalog.html



Thanks

Arun


0 Kudos
ArunJ_Intel
Moderator
2,087 Views

Hi dbrayford,


Hope you have gone through the tensorflow+horovod container options available. Please let us know in case if you need any additional help regarding this issue.


Thanks

Arun


0 Kudos
ArunJ_Intel
Moderator
2,072 Views

Hi dbrayford


We are assuming that the solution provided helped and would no longer be monitoring this issue. Please raise a new thread if you have further issues.


Thanks

Arun Jose


0 Kudos
Reply