- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I ran a larger linear algebra workflow which also calls various blas and lapack function including potrf, trsm, potri, syrk, symm and gemm. The arrays involved can be up to 150,000 x 150,000.
I have observed that the workflow's processing time substantially increases when executed at an AMD cpu, up to the point that the workflow had to be interrupted. Investigating the issue I found that potri is the culprit, where MKL is not even returning. Observing the core usage while potri is called it is multi-core but substantially fluctuates between using all cores and only one.
Here is the setup:
- compiled with Intel Clang++ with arguments:
- -march=x86-64-v4
- -std=c++20
- -fPIE
- -std=gnu++20
- -ferror-limit=4 -O2
- -qopenmp
- -fp-model=precise
- all libraries are statically linked (including MKL, pthread and libiomp5)
System libraries are linked dynamically:
linux-vdso.so.1 (0x00000673cc8ff000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00000673cc807000)
libmvec.so.1 => /lib/x86_64-linux-gnu/libmvec.so.1 (0x00000673c0707000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00000673c06d9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00000673c0400000)
/lib64/ld-linux-x86-64.so.2 (0x00000673cc901000)
software is executed on an Azure instance running Ubuntu 24.04 using environment variable settings
ulimit -s unlimited
export OMP_NUM_THREADS=48
export MKL_NUM_THREADS=48
export OMP_DYNAMIC=FALSE
export OMP_MAX_ACTIVE_LEVELS=2147483647
export OMP_PLACES=cores
export OMP_PROC_BIND=trueOn a E96s v6 instance with an INTEL(R) XEON(R) PLATINUM 8573C processor the software behaves normally.
On a E96ads_v6 instance with an AMD EPYC™ 9004 processor the software hangs in the MKL potri routine.
Note that all other MKL routines (potrf, trsm etc) have not shown the above problems.
Any idea?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The processing fro dpotri on a 1146878 x 1146878 matrix:
| INTEL(R) XEON(R) PLATINUM 8573C, 48 cores | 3198 seconds |
| AMD EPYC™ 9004, 48 cores | 21141 seconds |
This is an increase by factor 6.6.
The above numbers also apply for compiling using clang++ and linking against llvm omp.
The MKL version used is 2026.0.0.198.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A quick calculations shows the size of this matrix is almost a 10 TB:
>>> 1146878**2 * 8 / 1024**3
9799.965820342302According to this page, the Intel Xeon can address 4 TB of memory. And according to this sheet, the AMD EPYC could have up to 6 TB.
Could swapping to disk be the issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I mistakenly linked the spec for the Intel® Xeon® Platinum 8570 Processor, not the 8573C.
But the Azure pages (links given below), show the following:
| Size Name | vCPUs (Qty.) | Memory (GB) |
| Standard_E96ads_v6 | 96 | 672 |
| Standard_E96s_v6 | 96 | 768 |
The machines differ in the local and remote storage they have available.
Sources:
- https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/memory-optimized/esv6-series?tabs=sizebasic
- https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/memory-optimized/eadsv6-series?tabs=sizebasic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ivanp
the actual number is 146878. Sorry for the confusion.
I could solve the problem. A large range in diagonal elements with resulting borderline floating point numbers causes the issue. For some reason Intel handles the problem without overhead, contrarily to AMD.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No worries. At 161 GB that should fit.
I have read elsewhere that denormalized numbers and gradual underflows can cause processing slowdowns.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page