Intel® Optimized AI Frameworks
Get community support for questions related to PyTorch* and TensorFlow* frameworks.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
56 Discussions

Run more docker containters with Inter-optimized-tensorflow on One 8 physical core 16cores Cpu

OosakiKaNa
Beginner
2,030 Views
hello, I find the inter-optimized-tensorflow has the great increasing on train phase. but i want to run 3 docker containters in 8 physical core 16cores Cpu, i set every containter with 4 logical core how i set the param intra_/inter_op_parallelism_threads and OMP_NUM_THREADS? when one containter runs, the train time cost 17s every epoch, but when i run 3 containters, in every containter the train time cost 50s/epoch. by the way i set intra_/inter_op_parallelism_threads =2, OMP_NUM_THREADS= 2 ,KMP_BLOCKTIME=1 in containter. please tell me why?
0 Kudos
1 Solution
Jianyu_Z_Intel
Employee
1,674 Views

Hi,

  In K8S case, Intel provide the solution for CPU pinning: CPU Manager for Kubernetes* (also called CMK).

 

  Here is the guide for it.

  https://builders.intel.com/docs/networkbuilders/cpu-pin-and-isolation-in-kubernetes-app-note.pdf

 

  If you have more question about CMK, please create new issue for CMK in Intel Community.

 

  Good luck!

 

  Thank you! 

   

 

View solution in original post

18 Replies
AdrianM_Intel
Moderator
2,012 Views

Hello OosakiKaNa,

 

Thank you for posting on the Intel® communities.

 

To better assist you, we have moved your question to another forum.

 

Regards,

 

Adrian M.

Intel Customer Support Technician

AthiraM_Intel
Moderator
1,992 Views

Hi,


Could you please share the following details:


1) Docker images you used?

2) Complete steps to reproduce the issue including the commands you used

3) Intel tensorflow version used

4) OS details



Thanks




OosakiKaNa
Beginner
1,989 Views

docker images: intel/intel-optimized-tensorflow:2.2.0-centos-8-mpich-horovod

my os: centos8

docker run -itd --cpuset-cpus=1,2,3,4 -v /home/liangliang/nfscontent/:/tf/tft/output tft:v1

tft:v1 is my program iamge

 

thanks

AthiraM_Intel
Moderator
1,950 Views

Hi,

 

Thanks for sharing the details.

Could you please share the log file by enabling KMP_AFFINITY verbose.

ie, KMP_AFFINITY=verbose

 

Please find the below link for more information:

 

https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-...

 

Also you can try by increasing the OMP_NUM_THREADS , set OMP_NUM_THREADS = 8 and check whether there is any improvement?

 

 

Thanks.

 

 

 

OosakiKaNa
Beginner
1,938 Views

Hi!

Thanks for you advice 

I should share more details

my Inter-optimized-tensorflow containter Environment variables is 
ENV OMP_NUM_THREADS='4'
ENV KMP_BLOCKTIME='1'
ENV KMP_AFFINITY=granularity=fine,verbose,compact,1,0

i run the CMD docker run -itd --cpuset-cpus=7, 8, 9 , 10

also i set tf.config  intra_/inter_op_parallelism_threads =4, 2

this is the verbose when i run one containter:

image.png

the train phase cost time is 23s, it is very fast!

when I set OMP_NUM_THREADS = '8', and other param is fixed,  I find the train speed is very slow. it set 4 the train speed is fast.

 

but when i run two containters:(the another is run cpu1,2,3,4)

OosakiKaNa_0-1628232167130.png

you can find the train phase cost time is increasing,  i dont know why

and this is my host Htop status

OosakiKaNa_1-1628232241797.png

 

thanks.

Louie_T_Intel
Moderator
1,772 Views

Hi

 

From the KMP verbose log, you could see 8 threads bound to cpu 7-10 when you set OMP_NUM_THREADS = '4'.

If you have hyperthreading on, each thread could use 1 hyper thread because number of hyper threading is 8 in this case.

 

However, when you set set OMP_NUM_THREADS = '8', you will have 16 threads to compete 8 hyper threads. the performance will be impacted.

 

 

For the two container case, do you run your workloads on a system with2 sockets?

If yes, you might need to use numactl to make all threads within a container to run on one socket instead of two sockets to reduce some NUMA remote access issue.

 

regards

 

 

OosakiKaNa
Beginner
1,762 Views

Hi~

Thanks for your reply

I don't run my workloads on a system with 2 sockets 

This is My computer cpu information

OosakiKaNa_0-1629939364467.png

But tomorrow my company buy 10 computers with Gold 6248R 2sockets 24C/48T 

Actually I use k8s manage my model at 29 computers, so Do you know how can I make all threads within a container to run on one socket instead of two sockets with k8s setting?

My English is poor, sorry.

Regards

 

AthiraM_Intel
Moderator
1,881 Views

Hi,


We are checking on your issue. Could you please share the sample reproducer and complete steps to try out the same from our end.


Thanks


OosakiKaNa
Beginner
1,870 Views

Hi!

What should I do? send you my program and dataset?
I dont know how to do, please tell me
Thanks

AthiraM_Intel
Moderator
1,837 Views

Hi,


Yes, you can share your sample reproducer and commands used. Regarding this we will contact you through private message shortly.


Thanks


OosakiKaNa
Beginner
1,790 Views
Hi
I am sorry to reply you for a so long time
My company doesn't let me share the Program and Data
 
Actually, I have gived up at the Issue,I think maybe it's Hardware Limitation,So It can't solve this problem with Software Setting.
The Model is not so complex, it just have 220K parameter,The data is just a excel file with 10K row and 13 columns.
but this code is not for running in the docker Containter.
I run the model with inter-optimized-inter 2.2.0  but i doesn't using the Tensorflow2 property
I import tensorflow.compat.v1 as tf so I think maybe use tf2.0 can bring some advancement
But recently I can't do the experiment with this setting, if i have time i will try. and i will contact you.
So the issue maybe is over
My English is poor, sorry. 
Thanks for your help ! 
Jianyu_Z_Intel
Employee
1,704 Views

Hi,

  To simplify the description, we use physical cores in this topic.

  I think in your case, set the same cores numbers to each container, but the containers share some cores in same time. So, the performance is reduced to 1/3 of one container.

 To resolve this issue, please assign different cores to different containers. Like:  

docker run -it --cpus="1,2" ubuntu /bin/bash
docker run -it --cpus="3,4" ubuntu /bin/bash
docker run -it --cpus="5,6" ubuntu /bin/bash

Refer to: https://docs.docker.com/config/containers/resource_constraints/

 

Thank you!

OosakiKaNa
Beginner
1,696 Views

Hi!

Thanks for your reply

Please take a look on my reply at ‎08-05-2021 11:46 PM

I run the two docker containter on cpu 7,8,9,10 and 1,2,3,4

My computer RAM is 32GB, I set they run different cpu, but the issue is still exists

My English is poor, sorry

Regards

Jianyu_Z_Intel
Employee
1,693 Views

Hi,

  Don't warry! I fully understand your words. 

  In your CPU, there are 8 cores. The cores 0-7 are the index of them.

  Index 8 and index 0 are same core in fact.

  In your case: cpu 7,8,9,10 and 1,2,3,4

      1,9 & 2, 10, they are same cores in fact.

  That means they share 2 cores (1(9), 2(10)). That will impact the performance.

  If you want to use 4 cores per container, please use 0-3, 4-7.

  Avoid to assign one core to more than one container.

 

  Thank you!

  In my example:

  

  

  

OosakiKaNa
Beginner
1,685 Views

Hi!

Thanks for your so fast reply

I will do the experiment with this setting

But Actually I manage my model on 34 computers with k8s, the k8s control the docker containter with Cgroups, It can't assign physical core(maybe can't, at now i don't know this)


So if this issue is about the cpu share(means hardware issue), it can't solve by software setting(I guess).

I just want to know what brings this problem.

Thanks

Jianyu_Z_Intel
Employee
1,675 Views

Hi,

  In K8S case, Intel provide the solution for CPU pinning: CPU Manager for Kubernetes* (also called CMK).

 

  Here is the guide for it.

  https://builders.intel.com/docs/networkbuilders/cpu-pin-and-isolation-in-kubernetes-app-note.pdf

 

  If you have more question about CMK, please create new issue for CMK in Intel Community.

 

  Good luck!

 

  Thank you! 

   

 

OosakiKaNa
Beginner
1,668 Views

Hi!

From your reply I know the reason cause the Issue and the tools to solve it 

Thus, the issue is over!

Thank you and community's everyone!

Thank Intel!

Jianyu_Z_Intel
Employee
1,660 Views

Hi,

  It's our pleasure! 

 

 Thank your support!

  

  

Reply