- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have Intel Xeon W3-2423 CPU and have tried to compile llama.cpp with AMX on Windows 11 workstation. However, many methods (including Intel oneAPI, MSVC (the most updated) and mingw64 (gcc)) I had used were not successful and the info from llama.cpp is always AMX_INT8 = 1 (AMX_TILE and others did not appear).
I read posts on the web about this and found users obtained faster or better performance, where they had their system on Linux.
What should I do? Thank you!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ac27037,
Greetings!
We understand that you are using an Intel Xeon W3-2423 CPU and have attempted to compile llama.cpp with AMX support on a Windows 11 workstation. You’ve mentioned trying several methods, including Intel oneAPI, MSVC (most up-to-date), and MinGW64 (GCC), but have encountered issues.
To assist you more effectively, could you please provide additional details on the following:
- What specific errors or issues are you encountering during the compilation?
- Which version of llama.cpp are you using?
- Are you targeting a particular use case
Regards
Pujeeth
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
1. What specific errors or issues are you encountering during the compilation?
There is no error, but just can only enable AMX_INT8 on Windows 11. I have checked the flag of my CPU, and it should have AMX_TILE, AMX_INT8 and AMX_BF16, as the script in ggml-cpu/CMakeLists.txt:
if (GGML_AMX_TILE)
list(APPEND ARCH_DEFINITIONS __AMX_TILE__ GGML_AMX_TILE)
endif()
if (GGML_AMX_INT8)
list(APPEND ARCH_DEFINITIONS __AMX_INT8__ GGML_AMX_INT8)
endif()
if (GGML_AMX_BF16)
list(APPEND ARCH_DEFINITIONS __AMX_BF16__ GGML_AMX_BF16)
endif()
However, the result with Mingw is:
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
During the inference, it always says '"cannot be used with preferred buffer type AMX, using CPU instead"'.
2. The version of llama.cpp is b5392.
3. I expect AMX and speed up the inferences with GGUF models.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ac27037,
Thank you for your response.
Could you please confirm whether the processor was purchased along with the system? Additionally, I would appreciate it if you could provide the details of the server board.
Regards,
Simon
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Simon:
The workstation is Lenovo P5. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ac27037,
Thank you for contacting the Intel Community.
We appreciate your response. Since your system is an OEM (Original Equipment Manufacturer) product, we kindly recommend reaching out to Lenovo for further assistance. They are the best point of contact for support related to your system.
Thank you for your understanding.
Regards,
Azeem
Intel Customer Support Technician

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page