- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In K8S case, Intel provide the solution for CPU pinning: CPU Manager for Kubernetes* (also called CMK).
Here is the guide for it.
https://builders.intel.com/docs/networkbuilders/cpu-pin-and-isolation-in-kubernetes-app-note.pdf
If you have more question about CMK, please create new issue for CMK in Intel Community.
Good luck!
Thank you!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello OosakiKaNa,
Thank you for posting on the Intel® communities.
To better assist you, we have moved your question to another forum.
Regards,
Adrian M.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please share the following details:
1) Docker images you used?
2) Complete steps to reproduce the issue including the commands you used
3) Intel tensorflow version used
4) OS details
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
docker images: intel/intel-optimized-tensorflow:2.2.0-centos-8-mpich-horovod
my os: centos8
docker run -itd --cpuset-cpus=1,2,3,4 -v /home/liangliang/nfscontent/:/tf/tft/output tft:v1
tft:v1 is my program iamge
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for sharing the details.
Could you please share the log file by enabling KMP_AFFINITY verbose.
ie, KMP_AFFINITY=verbose
Please find the below link for more information:
Also you can try by increasing the OMP_NUM_THREADS , set OMP_NUM_THREADS = 8 and check whether there is any improvement?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Thanks for you advice
I should share more details
my Inter-optimized-tensorflow containter Environment variables is
ENV OMP_NUM_THREADS='4'
ENV KMP_BLOCKTIME='1'
ENV KMP_AFFINITY=granularity=fine,verbose,compact,1,0
i run the CMD docker run -itd --cpuset-cpus=7, 8, 9 , 10
also i set tf.config intra_/inter_op_parallelism_threads =4, 2
this is the verbose when i run one containter:
the train phase cost time is 23s, it is very fast!
when I set OMP_NUM_THREADS = '8', and other param is fixed, I find the train speed is very slow. it set 4 the train speed is fast.
but when i run two containters:(the another is run cpu1,2,3,4)
you can find the train phase cost time is increasing, i dont know why
and this is my host Htop status
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
From the KMP verbose log, you could see 8 threads bound to cpu 7-10 when you set OMP_NUM_THREADS = '4'.
If you have hyperthreading on, each thread could use 1 hyper thread because number of hyper threading is 8 in this case.
However, when you set set OMP_NUM_THREADS = '8', you will have 16 threads to compete 8 hyper threads. the performance will be impacted.
For the two container case, do you run your workloads on a system with2 sockets?
If yes, you might need to use numactl to make all threads within a container to run on one socket instead of two sockets to reduce some NUMA remote access issue.
regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi~
Thanks for your reply
I don't run my workloads on a system with 2 sockets
This is My computer cpu information
But tomorrow my company buy 10 computers with Gold 6248R 2sockets 24C/48T
Actually I use k8s manage my model at 29 computers, so Do you know how can I make all threads within a container to run on one socket instead of two sockets with k8s setting?
My English is poor, sorry.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are checking on your issue. Could you please share the sample reproducer and complete steps to try out the same from our end.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
What should I do? send you my program and dataset?
I dont know how to do, please tell me
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, you can share your sample reproducer and commands used. Regarding this we will contact you through private message shortly.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
To simplify the description, we use physical cores in this topic.
I think in your case, set the same cores numbers to each container, but the containers share some cores in same time. So, the performance is reduced to 1/3 of one container.
To resolve this issue, please assign different cores to different containers. Like:
docker run -it --cpus="1,2" ubuntu /bin/bash
docker run -it --cpus="3,4" ubuntu /bin/bash
docker run -it --cpus="5,6" ubuntu /bin/bash
Refer to: https://docs.docker.com/config/containers/resource_constraints/
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Thanks for your reply
Please take a look on my reply at 08-05-2021 11:46 PM
I run the two docker containter on cpu 7,8,9,10 and 1,2,3,4
My computer RAM is 32GB, I set they run different cpu, but the issue is still exists
My English is poor, sorry
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Don't warry! I fully understand your words.
In your CPU, there are 8 cores. The cores 0-7 are the index of them.
Index 8 and index 0 are same core in fact.
In your case: cpu 7,8,9,10 and 1,2,3,4
1,9 & 2, 10, they are same cores in fact.
That means they share 2 cores (1(9), 2(10)). That will impact the performance.
If you want to use 4 cores per container, please use 0-3, 4-7.
Avoid to assign one core to more than one container.
Thank you!
In my example:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Thanks for your so fast reply
I will do the experiment with this setting
But Actually I manage my model on 34 computers with k8s, the k8s control the docker containter with Cgroups, It can't assign physical core(maybe can't, at now i don't know this)
So if this issue is about the cpu share(means hardware issue), it can't solve by software setting(I guess).
I just want to know what brings this problem.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In K8S case, Intel provide the solution for CPU pinning: CPU Manager for Kubernetes* (also called CMK).
Here is the guide for it.
https://builders.intel.com/docs/networkbuilders/cpu-pin-and-isolation-in-kubernetes-app-note.pdf
If you have more question about CMK, please create new issue for CMK in Intel Community.
Good luck!
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
From your reply I know the reason cause the Issue and the tools to solve it
Thus, the issue is over!
Thank you and community's everyone!
Thank Intel!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It's our pleasure!
Thank your support!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page