murphl15@W10F84D6M3:~/inference_results_v4.0/closed/Intel/code/automation$ python3 run.py -n bert-99 -d ${DATA_DIR}/bert/dataset -m ${DATA_DIR}/bert/model -t ${OUTPUT_DIR} -x ${SUFFIX}

[2024-09-24 19:19:03,388][INFO] run.py:23   - Loading configurations from /home/murphl15/inference_results_v4.0/closed/Intel/code/automation/config.yaml.

[2024-09-24 19:19:03,425][INFO] run.py:141  - Parsing args:

[2024-09-24 19:19:03,426][INFO] run.py:144  - args: Namespace(accuracy_only=False, ci_run=False, compliance_only=False, container_name_suffix='cpu', dataset_dir='/data//bert/dataset', dtype='int8', implementation='pytorch-cpu', model='bert-99', model_dir='/data//bert/model', offline_only=False, output='/data/Intel/', performance_only=False, server_only=False, skip_create_container=False, skip_data_preprocess=False, skip_docker_build=False)

Path mapping:

╒════════════════════╤══════════════════════════════════════════╤═══════════════════════════════════════╕

│                    │ Local                                    │ Container                             │

╞════════════════════╪══════════════════════════════════════════╪═══════════════════════════════════════╡

│ code dir           │ /home/murphl15/inference_results_v4.0/cl │ /opt/workdir/code/bert-99/pytorch-cpu │

│                    │ osed/Intel/code/bert-99/pytorch-cpu      │                                       │

├────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────┤

│ automation kit dir │ /home/murphl15/inference_results_v4.0/cl │ /opt/workdir/code/bert-99/pytorch-    │

│                    │ osed/Intel/code/automation               │ cpu/automation                        │

├────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────┤

│ data dir           │ /data//bert/dataset                      │ /data/mlperf_data/bert/dataset        │

├────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────┤

│ model dir          │ /data//bert/model                        │ /data/mlperf_data/bert/model          │

├────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────┤

│ output dir         │ /data/Intel/                             │ /output                               │

╘════════════════════╧══════════════════════════════════════════╧═══════════════════════════════════════╛

Runtime options:

╒═════════════╤═════════╕

│ Option      │ Value   │

╞═════════════╪═════════╡

│ prepare     │ True    │

├─────────────┼─────────┤

│ performance │ True    │

├─────────────┼─────────┤

│ accuracy    │ True    │

├─────────────┼─────────┤

│ compliance  │ True    │

├─────────────┼─────────┤

│ offline     │ True    │

├─────────────┼─────────┤

│ server      │ True    │

├─────────────┼─────────┤

│ sensors     │ True    │

╘═════════════╧═════════╛

[2024-09-24 19:19:03,427][INFO] run.py:281  - Collecting environment information on baremetal.

[2024-09-24 19:19:03,441][INFO] run.py:212  - Building docker image for bert-99/pytorch-cpu/int8.

[+] Building 6103.2s (28/31)                                                                                               docker:default

=> [internal] load build definition from Dockerfile                                                                                 0.0s

=> => transferring dockerfile: 5.18kB                                                                                               0.0s

=> resolve image config for docker-image://docker.io/docker/dockerfile:experimental                                                 0.6s

=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e7  0.0s

=> [internal] load build definition from Dockerfile                                                                                 0.0s

=> [internal] load .dockerignore                                                                                                    0.0s

=> => transferring context: 2B                                                                                                      0.0s

=> [internal] load metadata for docker.io/library/rockylinux:8.7                                                                    0.5s

=> CACHED [dev-base 1/3] FROM docker.io/library/rockylinux:8.7@sha256:68bef3459bbb8c33841575a7f71c4de94718b7bbd103fd0417a537395d40  0.0s

=> [internal] load build context                                                                                                    0.0s

=> => transferring context: 7.94kB                                                                                                  0.0s

=> [dev-base 2/3] RUN --mount=type=cache,id=yum-dev,target=/var/cache/yum     DEBIAN_FRONTEND=noninteractive dnf install -y     c  84.0s

=> [dev-base 3/3] RUN echo "alias ll='ls -l'" >> /root/.bashrc                                                                      0.4s

=> [conda 1/8] RUN wget -O ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh [repo.anaconda.com]  &&     chmod +x   61.0s

=> [conda 2/8] RUN /opt/conda/bin/conda config --add channels https://software.repos.intel.com/python/conda [software.repos.intel.com]                         0.6s

=> [conda 3/8] RUN /opt/conda/bin/conda install -y -c conda-forge ninja==1.10.2 cmake==3.22.2                                      38.1s

=> [conda 4/8] RUN /opt/conda/bin/conda install -y -c conda-forge llvm-openmp==12.0.1                                               8.3s

=> [conda 5/8] RUN /opt/conda/bin/conda install -y -c https://software.repos.intel.com/python/conda [software.repos.intel.com] mkl==2023.1.0                  90.7s

=> [conda 6/8] RUN /opt/conda/bin/conda clean -ya                                                                                   1.7s

=> [conda 7/8] RUN cd /opt && git clone https://github.com/llvm/llvm-project.git [github.com] &&     cd llvm-project && git checkout llvmorg-  545.2s

=> [conda 8/8] RUN cd /opt/llvm-project && mkdir build && cd build &&     conda list | grep ninja &&     cmake ../llvm -GNinja   2017.4s

=> [build 1/4] COPY --from=conda /opt/conda /opt/conda                                                                             17.8s

=> [build 2/4] WORKDIR /opt/workdir                                                                                                 0.0s

=> [build 3/4] COPY ./code/bert-99/pytorch-cpu/patches patches                                                                      0.1s

=> [build 4/4] RUN git clone https://github.com/pytorch/pytorch.git [github.com] &&     cd pytorch && git checkout v1.12.0 &&     git submod  2936.2s

=> [setup 1/6] COPY --from=build /opt/conda /opt/conda                                                                             16.1s

=> [setup 2/6] WORKDIR /opt/workdir                                                                                                 0.0s

=> [setup 3/6] COPY ./code/bert-99 code/bert-99                                                                                     0.1s

=> [setup 4/6] COPY ./code/run_clean.sh code/bert-99/pytorch-cpu/run_clean.sh                                                       0.0s

=> [setup 5/6] COPY ./code/user_config.py code/user_config.py                                                                       0.0s

=> ERROR [setup 6/6] RUN cd code/bert-99/pytorch-cpu/ &&     if [ -d "inference" ];then rm -rf inference ;fi &&     git clone --  265.3s

------

> [setup 6/6] RUN cd code/bert-99/pytorch-cpu/ &&     if [ -d "inference" ];then rm -rf inference ;fi &&     git clone --recursive https://github.com/mlcommons/inference.git [github.com]  &&     cp inference/mlperf.conf . &&     cd mlperf_plugins && if [ -d "onednn" ];then rm -rf onednn ; fi && git clone https://github.com/oneapi-src/oneDNN.git [github.com] onednn&&     cd onednn && git checkout v2.6 && git apply ../../patches/onednnv2_6.patch &&     cd ../../ && rm -rf /opt/conda/lib/cmake/mkl/* && mkdir build && cd build &&     cmake -DCMAKE_CXX_FLAGS="-march=native" -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DBUILD_TPPS_INTREE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$(dirname $(python3 -c 'import torch; print(torch.__file__)'));../cmake/Modules" -GNinja -DUSERCP=ON .. &&     ninja && pip install boto3==1.34.35 tokenization==1.0.7:

0.381 Cloning into 'inference'...

132.8 Submodule 'language/bert/DeepLearningExamples' (https://github.com/NVIDIA/DeepLearningExamples.git [github.com]) registered for path 'language/bert/DeepLearningExamples'

132.8 Cloning into '/opt/workdir/code/bert-99/pytorch-cpu/inference/language/bert/DeepLearningExamples'...

162.7 Submodule path 'language/bert/DeepLearningExamples': checked out 'b03375bd6c2c5233130e61a3be49e26d1a20ac7c'

162.7 Submodule 'PyTorch/SpeechRecognition/Jasper/external/tensorrt-inference-server' (https://github.com/NVIDIA/tensorrt-inference-server.git [github.com]) registered for path 'language/bert/DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/external/tensorrt-inference-server'

162.7 Submodule 'PyTorch/Translation/Transformer/cutlass' (https://github.com/NVIDIA/cutlass.git [github.com]) registered for path 'language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass'

162.7 Cloning into '/opt/workdir/code/bert-99/pytorch-cpu/inference/language/bert/DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/external/tensorrt-inference-server'...

173.4 Cloning into '/opt/workdir/code/bert-99/pytorch-cpu/inference/language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass'...

185.5 Submodule path 'language/bert/DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/external/tensorrt-inference-server': checked out '71f0771cb8cb2a2eb1c6a9433f9a56dd1f206c96'

185.7 Submodule path 'language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass': checked out 'ed2ed4d667ce95e1371bd62db32b6a114e774336'

185.7 Submodule 'tools/external/googletest' (https://github.com/google/googletest.git [github.com]) registered for path 'language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass/tools/external/googletest'

185.7 Cloning into '/opt/workdir/code/bert-99/pytorch-cpu/inference/language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass/tools/external/googletest'...

189.7 Submodule path 'language/bert/DeepLearningExamples/PyTorch/Translation/Transformer/cutlass/tools/external/googletest': checked out '9077ec7efe5b652468ab051e93c67589d5cb8f85'

189.8 Cloning into 'onednn'...

247.3 Note: switching to 'v2.6'.

247.3

247.3 You are in 'detached HEAD' state. You can look around, make experimental

247.3 changes and commit them, and you can discard any commits you make in this

247.3 state without impacting any branches by switching back to a branch.

247.3

247.3 If you want to create a new branch to retain commits you create, you may

247.3 do so (now or later) by using -c with the switch command. Example:

247.3

247.3   git switch -c <new-branch-name>

247.3

247.3 Or undo this operation with:

247.3

247.3   git switch -

247.3

247.3 Turn off this advice by setting config variable advice.detachedHead to false

247.3

247.3 HEAD is now at 52b5f107dd src: cpu: x64: reorder: improve performance for shapes with large spatial

248.1 -- The C compiler identification is Clang 15.0.7

248.1 -- The CXX compiler identification is Clang 15.0.7

248.1 -- Detecting C compiler ABI info

248.2 -- Detecting C compiler ABI info - done

248.2 -- Check for working C compiler: /opt/conda/bin/clang - skipped

248.2 -- Detecting C compile features

248.2 -- Detecting C compile features - done

248.2 -- Detecting CXX compiler ABI info

248.2 -- Detecting CXX compiler ABI info - done

248.2 -- Check for working CXX compiler: /opt/conda/bin/clang++ - skipped

248.2 -- Detecting CXX compile features

248.2 -- Detecting CXX compile features - done

248.2 -- Looking for pthread.h

248.3 -- Looking for pthread.h - found

248.3 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD

248.3 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed

248.3 -- Looking for pthread_create in pthreads

248.4 -- Looking for pthread_create in pthreads - not found

248.4 -- Looking for pthread_create in pthread

248.4 -- Looking for pthread_create in pthread - found

248.4 -- Found Threads: TRUE

248.4 CMake Warning at /opt/conda/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):

248.4   static library kineto_LIBRARY-NOTFOUND not found.

248.4 Call Stack (most recent call first):

248.4   /opt/conda/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)

248.4   CMakeLists.txt:9 (find_package)

248.4

248.4

248.4 -- Found Torch: /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch.so

248.7 -- Found OpenMP_C: -fopenmp=libomp (found version "5.0")

248.7 -- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")

248.7 -- Found OpenMP: TRUE (found version "5.0")

248.7 -- Using C++ compiler flags: -march=native -O3 -W -Wall

248.7 -- Using C++ standard: 14

248.7 -- Using static linker flags:

248.7 -- Using shared linker flags:

248.7 -- Using output path: /opt/workdir/code/bert-99/pytorch-cpu/build

248.7 mlperf_loadgen v4.1

248.7 -- Found PythonInterp: /opt/conda/bin/python (found version "3.8.19")

248.7 -- Using Python interpreter: /opt/conda/bin/python

249.1 CMake Warning at /opt/conda/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):

249.1   static library kineto_LIBRARY-NOTFOUND not found.

249.1 Call Stack (most recent call first):

249.1   /opt/conda/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)

249.1   mlperf_plugins/CMakeLists.txt:9 (find_package)

249.1

249.1

249.1 -- Found OpenMP_C: -fopenmp=libomp (found version "5.0")

249.1 -- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")

249.1 -- DNNL_TARGET_ARCH: X64

249.1 -- DNNL_LIBRARY_NAME: dnnl

249.1 -- Found OpenMP_C: -fopenmp=libomp (found version "5.0")

249.1 -- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")

249.1 -- Could NOT find Doxyrest (missing: DOXYREST_EXECUTABLE)

249.1 -- Found PythonInterp: /opt/conda/bin/python (found suitable version "3.8.19", minimum required is "2.7")

249.1 -- Could NOT find Sphinx (missing: SPHINX_EXECUTABLE)

249.1 -- Found Git: /usr/bin/git (found version "2.43.5")

249.1 -- Enabled workload: TRAINING

249.1 -- Enabled primitives: ALL

249.1 -- Enabled primitive CPU ISA: ALL

249.1 -- Enabled primitive GPU ISA: ALL

249.1 -- Primitive cache is disabled

249.2 -- The ASM compiler identification is Clang with GNU-like command-line

249.2 -- Found assembler: /opt/conda/bin/clang

249.2 Extra Torch C++ flags: -D_GLIBCXX_USE_CXX11_ABI=1

249.2 Extra OpenMP C++ flags: -fopenmp=libomp

249.2 Extra OpenMP C++ includes: /opt/conda/include

249.2 Extra OpenMP C++ libraries: /opt/conda/lib/libiomp5.so

249.2 Torch inlucde directories: /opt/conda/lib/python3.8/site-packages/torch/include;/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include

249.2 Torch linking libraries: torch;torch_library;/opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so

249.2 Extra Torch C++ flags: -D_GLIBCXX_USE_CXX11_ABI=1

249.2 Extra OpenMP C++ flags: -fopenmp=libomp

249.2 Extra OpenMP C++ includes: /opt/conda/include

249.2 Extra OpenMP C++ Libraries: /opt/conda/lib/libiomp5.so

249.2 Torch Inlucde directories: /opt/conda/lib/python3.8/site-packages/torch/include;/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include

249.2 Torch linking libraries: torch;torch_library;/opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so

249.2 -- Configuring done

249.4 -- Generating done

249.4 -- Build files have been written to: /opt/workdir/code/bert-99/pytorch-cpu/build

249.8 [1/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/amx_init.cpp.o

249.8 [2/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/hw_topology.cpp.o

249.8 [3/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/__/__/version_generated.cc.o

250.0 [4/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/kmp_launcher.cpp.o

250.0 [5/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/bindings/c_api.cc.o

250.4 [6/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/early_stopping.cc.o

250.8 [7/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/cpu.cpp.o

251.0 [8/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/utils.cc.o

251.2 [9/800] Building CXX object inference/loadgen/CMakeFiles/benchmark.dir/benchmark/repro.cpp.o

251.2 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/benchmark/repro.cpp:37:52: warning: unused parameter 'samples' [-Wunused-parameter]

251.2       const std::vector<mlperf::QuerySampleIndex>& samples) override {}

251.2                                                    ^

251.2 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/benchmark/repro.cpp:39:52: warning: unused parameter 'samples' [-Wunused-parameter]

251.2       const std::vector<mlperf::QuerySampleIndex>& samples) override {}

251.2                                                    ^

251.2 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/benchmark/repro.cpp:55:11: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]

251.2     if (n > mResponses.size()) {

251.2         ~ ^ ~~~~~~~~~~~~~~~~~

251.2 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/benchmark/repro.cpp:125:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]

251.2         for (int i = 0; i < actualSize; i++) {

251.2                         ~ ^ ~~~~~~~~~~

251.2 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/benchmark/repro.cpp:171:11: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]

251.2     if (n > reponses.size()) {

251.2         ~ ^ ~~~~~~~~~~~~~~~

251.2 5 warnings generated.

251.5 [10/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/version.cc.o

253.0 [11/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/tpps/i_softmax_tpp.cpp.o

253.0 FAILED: mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/tpps/i_softmax_tpp.cpp.o

253.0 /opt/conda/bin/clang++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dmlperf_plugins_EXPORTS -Dusercp -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/libxsmm/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps -I/opt/workdir/code/bert-99/pytorch-cpu/build/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/src/../include -isystem /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -march=native -O3 -DNDEBUG -fPIC -Wall -isystem /opt/conda/include -Wno-unused-function -march=native -mfma -D_GLIBCXX_USE_CXX11_ABI=1 -fopenmp=libomp -std=gnu++14 -MD -MT mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/tpps/i_softmax_tpp.cpp.o -MF mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/tpps/i_softmax_tpp.cpp.o.d -o mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/tpps/i_softmax_tpp.cpp.o -c /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/i_softmax_tpp.cpp

253.0 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/i_softmax_tpp.cpp:1:

253.0 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/i_softmax_tpp.hpp:5:

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:37:31: error: unknown type name '__m256h'

253.0   static void _mm256_print_ph(__m256h a) {

253.0                               ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:44:31: error: unknown type name '__m512h'

253.0   static void _mm512_print_ph(__m512h a) {

253.0                               ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:48:35: error: use of undeclared identifier '_mm256_loadu_ph'

253.0     auto f_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)mem));

253.0                                   ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:49:35: error: use of undeclared identifier '_mm256_loadu_ph'

253.0     auto s_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)&mem[16]));

253.0                                   ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:15: error: unknown type name '__m512h'

253.0 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

253.0               ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:44: error: unknown type name '__m512h'

253.0 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

253.0                                            ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:79:12: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto a = _mm512_set1_ph(-0.2888f);

253.0            ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:80:12: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto b = _mm512_set1_ph(1.0217744f);

253.0            ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:81:12: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto c = _mm512_set1_ph(0.0962405432f);

253.0            ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:83:13: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto nb = _mm512_set1_ph(1.769f);

253.0             ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:15: error: unknown type name '__m512h'

253.0 static inline __m512h _mm512_gelu_ph(__m512h x) {

253.0               ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:38: error: unknown type name '__m512h'

253.0 static inline __m512h _mm512_gelu_ph(__m512h x) {

253.0                                      ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:105:18: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto rsqrt_2 = _mm512_set1_ph(0.70710678);

253.0                  ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:106:48: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto y = _mm512_mlperf_erf_ph(x * rsqrt_2) + _mm512_set1_ph(1);

253.0                                                ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:108:14: error: use of undeclared identifier '_mm512_set1_ph'

253.0   return x * _mm512_set1_ph(0.5f) * y;

253.0              ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:54: error: unknown type name '__m512h'

253.0 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

253.0                                                      ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:65: error: unknown type name '__m512h'

253.0 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

253.0                                                                 ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:77: error: unknown type name '__m512h'

253.0 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

253.0                                                                             ^

253.0 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:125:14: error: use of undeclared identifier '_mm512_set1_ph'

253.0   auto max = _mm512_set1_ph(127.f);

253.0              ^

253.0 fatal error: too many errors emitted, stopping now [-ferror-limit=]

253.0 20 errors generated.

253.9 [12/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/results.cc.o

253.9 [13/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/issue_query_controller.cc.o

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:480:20: warning: lambda capture 'thread_idx' is not used [-Wunused-lambda-capture]

253.9         LogDetail([thread_idx](AsyncDetail& detail) {

253.9                    ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:247:7: note: in instantiation of function template specialization 'mlperf::loadgen::IssueQueryController::IssueQueriesInternal<mlperf::TestScenario::Server, true>' requested here

253.9       IssueQueriesInternal<TestScenario::Server, true>(num_threads, thread_idx);

253.9       ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:480:20: warning: lambda capture 'thread_idx' is not used [-Wunused-lambda-capture]

253.9         LogDetail([thread_idx](AsyncDetail& detail) {

253.9                    ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:300:5: note: in instantiation of function template specialization 'mlperf::loadgen::IssueQueryController::IssueQueriesInternal<mlperf::TestScenario::MultiStream, false>' requested here

253.9     IssueQueriesInternal<scenario, false>(1, 0);

253.9     ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:480:20: warning: lambda capture 'thread_idx' is not used [-Wunused-lambda-capture]

253.9         LogDetail([thread_idx](AsyncDetail& detail) {

253.9                    ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:300:5: note: in instantiation of function template specialization 'mlperf::loadgen::IssueQueryController::IssueQueriesInternal<mlperf::TestScenario::Offline, false>' requested here

253.9     IssueQueriesInternal<scenario, false>(1, 0);

253.9     ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:480:20: warning: lambda capture 'thread_idx' is not used [-Wunused-lambda-capture]

253.9         LogDetail([thread_idx](AsyncDetail& detail) {

253.9                    ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:300:5: note: in instantiation of function template specialization 'mlperf::loadgen::IssueQueryController::IssueQueriesInternal<mlperf::TestScenario::Server, false>' requested here

253.9     IssueQueriesInternal<scenario, false>(1, 0);

253.9     ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:480:20: warning: lambda capture 'thread_idx' is not used [-Wunused-lambda-capture]

253.9         LogDetail([thread_idx](AsyncDetail& detail) {

253.9                    ^

253.9 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/issue_query_controller.cc:300:5: note: in instantiation of function template specialization 'mlperf::loadgen::IssueQueryController::IssueQueriesInternal<mlperf::TestScenario::SingleStream, false>' requested here

253.9     IssueQueriesInternal<scenario, false>(1, 0);

253.9     ^

253.9 5 warnings generated.

254.0 [14/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/test_settings_internal.cc.o

254.7 [15/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/logging.cc.o

254.7 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/logging.cc:483:61: warning: unused parameter 'completion_time' [-Wunused-parameter]

254.7                                       PerfClock::time_point completion_time,

254.7                                                             ^

254.7 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/logging.cc:601:68: warning: unused parameter 'expected_count' [-Wunused-parameter]

254.7 std::vector<QuerySampleLatency> AsyncLog::GetTokenLatencies(size_t expected_count) {

254.7                                                                    ^

254.7 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/logging.cc:607:72: warning: unused parameter 'expected_count' [-Wunused-parameter]

254.7 std::vector<QuerySampleLatency> AsyncLog::GetTimePerOutputToken(size_t expected_count){

254.7                                                                        ^

254.7 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/logging.cc:613:58: warning: unused parameter 'expected_count' [-Wunused-parameter]

254.7 std::vector<int64_t> AsyncLog::GetTokensPerSample(size_t expected_count) {

254.7                                                          ^

254.7 4 warnings generated.

255.4 [16/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/softmax.cpp.o

255.4 FAILED: mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/softmax.cpp.o

255.4 /opt/conda/bin/clang++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dmlperf_plugins_EXPORTS -Dusercp -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/libxsmm/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps -I/opt/workdir/code/bert-99/pytorch-cpu/build/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/src/../include -isystem /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -march=native -O3 -DNDEBUG -fPIC -Wall -isystem /opt/conda/include -Wno-unused-function -march=native -mfma -D_GLIBCXX_USE_CXX11_ABI=1 -fopenmp=libomp -std=gnu++14 -MD -MT mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/softmax.cpp.o -MF mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/softmax.cpp.o.d -o mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/softmax.cpp.o -c /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/softmax.cpp

255.4 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/softmax.cpp:5:

255.4 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/i_softmax_tpp.hpp:5:

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:37:31: error: unknown type name '__m256h'

255.4   static void _mm256_print_ph(__m256h a) {

255.4                               ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:44:31: error: unknown type name '__m512h'

255.4   static void _mm512_print_ph(__m512h a) {

255.4                               ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:48:35: error: use of undeclared identifier '_mm256_loadu_ph'

255.4     auto f_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)mem));

255.4                                   ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:49:35: error: use of undeclared identifier '_mm256_loadu_ph'

255.4     auto s_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)&mem[16]));

255.4                                   ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:15: error: unknown type name '__m512h'

255.4 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

255.4               ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:44: error: unknown type name '__m512h'

255.4 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

255.4                                            ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:79:12: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto a = _mm512_set1_ph(-0.2888f);

255.4            ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:80:12: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto b = _mm512_set1_ph(1.0217744f);

255.4            ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:81:12: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto c = _mm512_set1_ph(0.0962405432f);

255.4            ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:83:13: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto nb = _mm512_set1_ph(1.769f);

255.4             ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:15: error: unknown type name '__m512h'

255.4 static inline __m512h _mm512_gelu_ph(__m512h x) {

255.4               ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:38: error: unknown type name '__m512h'

255.4 static inline __m512h _mm512_gelu_ph(__m512h x) {

255.4                                      ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:105:18: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto rsqrt_2 = _mm512_set1_ph(0.70710678);

255.4                  ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:106:48: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto y = _mm512_mlperf_erf_ph(x * rsqrt_2) + _mm512_set1_ph(1);

255.4                                                ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:108:14: error: use of undeclared identifier '_mm512_set1_ph'

255.4   return x * _mm512_set1_ph(0.5f) * y;

255.4              ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:54: error: unknown type name '__m512h'

255.4 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

255.4                                                      ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:65: error: unknown type name '__m512h'

255.4 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

255.4                                                                 ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:77: error: unknown type name '__m512h'

255.4 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

255.4                                                                             ^

255.4 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:125:14: error: use of undeclared identifier '_mm512_set1_ph'

255.4   auto max = _mm512_set1_ph(127.f);

255.4              ^

255.4 fatal error: too many errors emitted, stopping now [-ferror-limit=]

255.4 20 errors generated.

256.3 [17/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/activation.cpp.o

259.6 [18/800] Building CXX object inference/loadgen/CMakeFiles/mlperf_loadgen.dir/loadgen.cc.o

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:137:47: warning: unused parameter 'response_cb' [-Wunused-parameter]

259.6                       const ResponseCallback& response_cb) override {

259.6                                               ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:768:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary u_perf_summary{sut->Name(), u_settings, std::move(u_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:820:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary m_perf_summary{sut->Name(), m_settings, std::move(m_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:918:71: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary perf_summary{sut->Name(), settings, std::move(pr)};

259.6                                                                       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:989:58: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                        std::move(base_pr)};

259.6                                                          ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1011:68: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                     std::move(base_perf_summary.pr)};

259.6                                                                    ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1195:14: warning: lambda capture 'sut' is not used [-Wunused-lambda-capture]

259.6   LogDetail([sut, qsl, test_date_time, &sut_name,

259.6              ^~~~

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:918:71: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary perf_summary{sut->Name(), settings, std::move(pr)};

259.6                                                                       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1131:14: note: in instantiation of function template specialization 'mlperf::loadgen::RunPerformanceMode<mlperf::TestScenario::SingleStream>' requested here

259.6             (RunPerformanceMode<compile_time_scenario>),

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1138:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::SingleStream>' requested here

259.6         return GetCompileTime<TestScenario::SingleStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:989:58: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                        std::move(base_pr)};

259.6                                                          ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::SingleStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1138:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::SingleStream>' requested here

259.6         return GetCompileTime<TestScenario::SingleStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1011:68: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                     std::move(base_perf_summary.pr)};

259.6                                                                    ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:768:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary u_perf_summary{sut->Name(), u_settings, std::move(u_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1031:7: note: in instantiation of function template specialization 'mlperf::loadgen::FindBoundaries<mlperf::TestScenario::SingleStream>' requested here

259.6       FindBoundaries<scenario>(sut, qsl, sequence_gen, base_perf_summary);

259.6       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::SingleStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1138:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::SingleStream>' requested here

259.6         return GetCompileTime<TestScenario::SingleStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:820:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary m_perf_summary{sut->Name(), m_settings, std::move(m_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1057:37: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceBinarySearch<mlperf::TestScenario::SingleStream>' requested here

259.6   PerformanceSummary perf_summary = FindPeakPerformanceBinarySearch<scenario>(

259.6                                     ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::SingleStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1138:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::SingleStream>' requested here

259.6         return GetCompileTime<TestScenario::SingleStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:918:71: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary perf_summary{sut->Name(), settings, std::move(pr)};

259.6                                                                       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1131:14: note: in instantiation of function template specialization 'mlperf::loadgen::RunPerformanceMode<mlperf::TestScenario::MultiStream>' requested here

259.6             (RunPerformanceMode<compile_time_scenario>),

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1140:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::MultiStream>' requested here

259.6         return GetCompileTime<TestScenario::MultiStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:989:58: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                        std::move(base_pr)};

259.6                                                          ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::MultiStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1140:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::MultiStream>' requested here

259.6         return GetCompileTime<TestScenario::MultiStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1011:68: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                     std::move(base_perf_summary.pr)};

259.6                                                                    ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:768:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary u_perf_summary{sut->Name(), u_settings, std::move(u_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1031:7: note: in instantiation of function template specialization 'mlperf::loadgen::FindBoundaries<mlperf::TestScenario::MultiStream>' requested here

259.6       FindBoundaries<scenario>(sut, qsl, sequence_gen, base_perf_summary);

259.6       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::MultiStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1140:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::MultiStream>' requested here

259.6         return GetCompileTime<TestScenario::MultiStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:820:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary m_perf_summary{sut->Name(), m_settings, std::move(m_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1057:37: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceBinarySearch<mlperf::TestScenario::MultiStream>' requested here

259.6   PerformanceSummary perf_summary = FindPeakPerformanceBinarySearch<scenario>(

259.6                                     ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::MultiStream>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1140:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::MultiStream>' requested here

259.6         return GetCompileTime<TestScenario::MultiStream>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:918:71: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary perf_summary{sut->Name(), settings, std::move(pr)};

259.6                                                                       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1131:14: note: in instantiation of function template specialization 'mlperf::loadgen::RunPerformanceMode<mlperf::TestScenario::Server>' requested here

259.6             (RunPerformanceMode<compile_time_scenario>),

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1142:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Server>' requested here

259.6         return GetCompileTime<TestScenario::Server>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:989:58: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                        std::move(base_pr)};

259.6                                                          ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Server>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1142:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Server>' requested here

259.6         return GetCompileTime<TestScenario::Server>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1011:68: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                     std::move(base_perf_summary.pr)};

259.6                                                                    ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:768:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary u_perf_summary{sut->Name(), u_settings, std::move(u_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1031:7: note: in instantiation of function template specialization 'mlperf::loadgen::FindBoundaries<mlperf::TestScenario::Server>' requested here

259.6       FindBoundaries<scenario>(sut, qsl, sequence_gen, base_perf_summary);

259.6       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Server>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1142:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Server>' requested here

259.6         return GetCompileTime<TestScenario::Server>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:820:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary m_perf_summary{sut->Name(), m_settings, std::move(m_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1057:37: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceBinarySearch<mlperf::TestScenario::Server>' requested here

259.6   PerformanceSummary perf_summary = FindPeakPerformanceBinarySearch<scenario>(

259.6                                     ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Server>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1142:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Server>' requested here

259.6         return GetCompileTime<TestScenario::Server>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:918:71: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary perf_summary{sut->Name(), settings, std::move(pr)};

259.6                                                                       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1131:14: note: in instantiation of function template specialization 'mlperf::loadgen::RunPerformanceMode<mlperf::TestScenario::Offline>' requested here

259.6             (RunPerformanceMode<compile_time_scenario>),

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1144:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Offline>' requested here

259.6         return GetCompileTime<TestScenario::Offline>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:989:58: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                        std::move(base_pr)};

259.6                                                          ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Offline>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1144:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Offline>' requested here

259.6         return GetCompileTime<TestScenario::Offline>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1011:68: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6                                     std::move(base_perf_summary.pr)};

259.6                                                                    ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:768:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary u_perf_summary{sut->Name(), u_settings, std::move(u_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1031:7: note: in instantiation of function template specialization 'mlperf::loadgen::FindBoundaries<mlperf::TestScenario::Offline>' requested here

259.6       FindBoundaries<scenario>(sut, qsl, sequence_gen, base_perf_summary);

259.6       ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Offline>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1144:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Offline>' requested here

259.6         return GetCompileTime<TestScenario::Offline>();

259.6                ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:820:77: warning: missing field 'first_token_latency_min' initializer [-Wmissing-field-initializers]

259.6   PerformanceSummary m_perf_summary{sut->Name(), m_settings, std::move(m_pr)};

259.6                                                                             ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1057:37: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceBinarySearch<mlperf::TestScenario::Offline>' requested here

259.6   PerformanceSummary perf_summary = FindPeakPerformanceBinarySearch<scenario>(

259.6                                     ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1132:14: note: in instantiation of function template specialization 'mlperf::loadgen::FindPeakPerformanceMode<mlperf::TestScenario::Offline>' requested here

259.6             (FindPeakPerformanceMode<compile_time_scenario>)};

259.6              ^

259.6 /opt/workdir/code/bert-99/pytorch-cpu/inference/loadgen/loadgen.cc:1144:16: note: in instantiation of function template specialization 'mlperf::loadgen::RunFunctions::GetCompileTime<mlperf::TestScenario::Offline>' requested here

259.6         return GetCompileTime<TestScenario::Offline>();

259.6                ^

259.6 27 warnings generated.

259.7 [19/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/bert_model.cpp.o

260.8 [20/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/bert_qsl.cpp.o

261.8 [21/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/torch_sut.cpp.o

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:42:74: warning: field 'nInstances_' will be initialized after field 'warmUp_' [-Wreorder-ctor]

261.8   upper_watermark_(upper_watermark), nProcsPerInstance_(intra_parallel), nInstances_(inter_parallel),

261.8   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^~~~~~~~~~~~~~~~~~~~~~~~~~~

261.8   watermark_(watermark)              upper_watermark_(upper_watermark)   nProcsPerInstance_(intra_parallel)

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:46:8: warning: unused variable 'amx_status' [-Wunused-variable]

261.8   auto amx_status = amx_init::amx_init();

261.8        ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:126:15: warning: variable 'sample_count' set but not used [-Wunused-but-set-variable]

261.8           int sample_count = 0;

261.8               ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:127:15: warning: unused variable 'qos_count' [-Wunused-variable]

261.8           int qos_count = 0;

261.8               ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:256:39: warning: field 'nInstances_' will be initialized after field 'warmUp_' [-Wreorder-ctor]

261.8   nProcsPerInstance_(intra_parallel), nInstances_(inter_parallel), warmUp_(warmup),

261.8   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~

261.8   watermark_(watermark)               nProcsPerInstance_(intra_parallel) nInstances_(inter_parallel)

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:259:8: warning: unused variable 'amx_status' [-Wunused-variable]

261.8   auto amx_status = amx_init::amx_init();

261.8        ^

261.8 In file included from /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.cpp:18:

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:100:10: warning: private field 'mThreshold_' is not used [-Wunused-private-field]

261.8   size_t mThreshold_;

261.8          ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:103:10: warning: private field 'slength_' is not used [-Wunused-private-field]

261.8   size_t slength_ {384};

261.8          ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:105:10: warning: private field 'qos_pointer' is not used [-Wunused-private-field]

261.8   size_t qos_pointer {0};

261.8          ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:110:8: warning: private field 'mHt_' is not used [-Wunused-private-field]

261.8   bool mHt_;

261.8        ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:183:10: warning: private field 'mThreshold_' is not used [-Wunused-private-field]

261.8   size_t mThreshold_;

261.8          ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:185:10: warning: private field 'slength_' is not used [-Wunused-private-field]

261.8   size_t slength_ {385};

261.8          ^

261.8 /opt/workdir/code/bert-99/pytorch-cpu/csrc/torch_sut.hpp:191:8: warning: private field 'mHt_' is not used [-Wunused-private-field]

261.8   bool mHt_;

261.8        ^

261.8 13 warnings generated.

262.1 [22/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_linear.cpp.o

262.1 FAILED: mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_linear.cpp.o

262.1 /opt/conda/bin/clang++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dmlperf_plugins_EXPORTS -Dusercp -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/libxsmm/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps -I/opt/workdir/code/bert-99/pytorch-cpu/build/mlperf_plugins/onednn/include -I/opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/onednn/src/../include -isystem /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -march=native -O3 -DNDEBUG -fPIC -Wall -isystem /opt/conda/include -Wno-unused-function -march=native -mfma -D_GLIBCXX_USE_CXX11_ABI=1 -fopenmp=libomp -std=gnu++14 -MD -MT mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_linear.cpp.o -MF mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_linear.cpp.o.d -o mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_linear.cpp.o -c /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/amx_linear.cpp

262.1 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/amx_linear.cpp:10:

262.1 In file included from /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/i_linear_tpp.hpp:10:

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:37:31: error: unknown type name '__m256h'

262.1   static void _mm256_print_ph(__m256h a) {

262.1                               ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:44:31: error: unknown type name '__m512h'

262.1   static void _mm512_print_ph(__m512h a) {

262.1                               ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:48:35: error: use of undeclared identifier '_mm256_loadu_ph'

262.1     auto f_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)mem));

262.1                                   ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:49:35: error: use of undeclared identifier '_mm256_loadu_ph'

262.1     auto s_half = _mm512_cvtph_ps(_mm256_loadu_ph((void*)&mem[16]));

262.1                                   ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:15: error: unknown type name '__m512h'

262.1 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

262.1               ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:78:44: error: unknown type name '__m512h'

262.1 static inline __m512h _mm512_mlperf_erf_ph(__m512h x) {

262.1                                            ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:79:12: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto a = _mm512_set1_ph(-0.2888f);

262.1            ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:80:12: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto b = _mm512_set1_ph(1.0217744f);

262.1            ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:81:12: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto c = _mm512_set1_ph(0.0962405432f);

262.1            ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:83:13: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto nb = _mm512_set1_ph(1.769f);

262.1             ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:15: error: unknown type name '__m512h'

262.1 static inline __m512h _mm512_gelu_ph(__m512h x) {

262.1               ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:104:38: error: unknown type name '__m512h'

262.1 static inline __m512h _mm512_gelu_ph(__m512h x) {

262.1                                      ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:105:18: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto rsqrt_2 = _mm512_set1_ph(0.70710678);

262.1                  ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:106:48: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto y = _mm512_mlperf_erf_ph(x * rsqrt_2) + _mm512_set1_ph(1);

262.1                                                ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:108:14: error: use of undeclared identifier '_mm512_set1_ph'

262.1   return x * _mm512_set1_ph(0.5f) * y;

262.1              ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:54: error: unknown type name '__m512h'

262.1 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

262.1                                                      ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:65: error: unknown type name '__m512h'

262.1 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

262.1                                                                 ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:124:77: error: unknown type name '__m512h'

262.1 static inline __m512i _mm512_scale_minmax_gelu_i8_ph(__m512h x, __m512h vS, __m512h vS2) {

262.1                                                                             ^

262.1 /opt/workdir/code/bert-99/pytorch-cpu/mlperf_plugins/csrc/tpps/el_common_intrin.hpp:125:14: error: use of undeclared identifier '_mm512_set1_ph'

262.1   auto max = _mm512_set1_ph(127.f);

262.1              ^

262.1 fatal error: too many errors emitted, stopping now [-ferror-limit=]

262.1 20 errors generated.

263.5 [23/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_mha.cpp.o

264.0 [24/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/amx_mha_concat.cpp.o

264.0 [25/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/init.cpp.o

264.4 [26/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/normalization.cpp.o

265.0 [27/800] Building CXX object mlperf_plugins/CMakeFiles/mlperf_plugins.dir/csrc/linear.cpp.o

265.2 [28/800] Building CXX object CMakeFiles/bert_inference.dir/csrc/main.cpp.o

265.2 ninja: build stopped: subcommand failed.

------

Dockerfile:98

--------------------

  97 |     ENV CONDA_PREFIX "/opt/conda"

  98 | >>> RUN cd code/${BENCHMARK}/${IMPL}/ && \

  99 | >>>     if [ -d "inference" ];then rm -rf inference ;fi && \

100 | >>>     git clone --recursive https://github.com/mlcommons/inference.git [github.com]  && \

101 | >>>     cp inference/mlperf.conf . && \

102 | >>>     cd mlperf_plugins && if [ -d "onednn" ];then rm -rf onednn ; fi && git clone https://github.com/oneapi-src/oneDNN.git [github.com] onednn&& \

103 | >>>     cd onednn && git checkout ${ONEDNN_VERSION} && git apply ../../patches/onednnv2_6.patch && \

104 | >>>     cd ../../ && rm -rf /opt/conda/lib/cmake/mkl/* && mkdir build && cd build && \

105 | >>>     cmake -DCMAKE_CXX_FLAGS="-march=native" -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DBUILD_TPPS_INTREE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$(dirname $(python3 -c 'import torch; print(torch.__file__)'));../cmake/Modules" -GNinja -DUSERCP=ON .. && \

106 | >>>     ninja && pip install boto3==1.34.35 tokenization==1.0.7

107 |

--------------------

ERROR: failed to solve: process "/bin/sh -c cd code/${BENCHMARK}/${IMPL}/ &&     if [ -d \"inference\" ];then rm -rf inference ;fi &&     git clone --recursive https://github.com/mlcommons/inference.git [github.com]  &&     cp inference/mlperf.conf . &&     cd mlperf_plugins && if [ -d \"onednn\" ];then rm -rf onednn ; fi && git clone https://github.com/oneapi-src/oneDNN.git [github.com] onednn&&     cd onednn && git checkout ${ONEDNN_VERSION} && git apply ../../patches/onednnv2_6.patch &&     cd ../../ && rm -rf /opt/conda/lib/cmake/mkl/* && mkdir build && cd build &&     cmake -DCMAKE_CXX_FLAGS=\"-march=native\" -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DBUILD_TPPS_INTREE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=\"$(dirname $(python3 -c 'import torch; print(torch.__file__)'));../cmake/Modules\" -GNinja -DUSERCP=ON .. &&     ninja && pip install boto3==1.34.35 tokenization==1.0.7" did not complete successfully: exit code: 1