Running Stable diffusion XL with optimum[openvino] is using half the total logical cores

Rajiv_Mandal · ‎09-01-2024

I am running the below code for using the stabilityai/stable-diffusion-xl-base-1.0 from HuggingFace, optimizing it with Optimum Intel for Openvino. When running the inference, I am doing htop utility. I am seeing that is is using only half of the logical cores present in the Xeon server. I am using the m7i.8xlarge EC2 instance in AWS, which has 16 physical cores, 32 logical cores on a one socket Sapphire Rapids Xeon Server. I am using Ubuntu24.04

While launching the jupyter notebook, I have used the command below to launch jupyter notebook using all 32 logical cores.

taskset -c 0-31 jupyter-lab --ip 0.0.0.0 --no-browser --allow-root

This is the code I am running in the jupyter notebook:

!pip install --upgrade-strategy eager optimum["openvino"]
!pip install diffusers

!pip install huggingface_hub

from huggingface_hub import notebook_login

notebook_login()

from optimum.intel import OVStableDiffusionXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id, export=True)

# Don't forget to save the exported model
pipeline.save_pretrained("openvino-sd-xl-base-1.0")

# Here is the inference code for using the Openvino converted model and running an inference for converting text to image using the converted model.

import os
from optimum.intel import OVStableDiffusionXLPipeline
from openvino.runtime import Core

#Use Environment Variables: Set the environment variables directly in your script to ensure they are applied:
os.environ["OMP_NUM_THREADS"] = "32"
os.environ["MKL_NUM_THREADS"] = "32"
os.environ["OPENBLAS_NUM_THREADS"] = "32"
os.environ["NUMEXPR_NUM_THREADS"] = "32"

# Initialize OpenVINO's Core object
core = Core()

# Set the number of threads to the total number of logical processors (vCPUs)
core.set_property("CPU", {
"INFERENCE_NUM_THREADS": "32",
"NUM_STREAMS": "1",
"CPU_BIND_THREAD": "YES" # Bind threads to specific cores
})

# Load the OpenVINO IR format model using the custom Core object
pipeline = OVStableDiffusionXLPipeline.from_pretrained("openvino-sd-xl-base-1.0", ov_core=core)

# Run inference for text-to-image generation
prompt = "boat in an ocean"
image = pipeline(prompt, num_inference_steps=50).images[0]

# Display the generated image
from IPython.display import display
display(image)

Aznie_Intel · ‎09-03-2024

Hi Rajiv Mandal,

Thanks for reaching out.

Can you enable the CPU cores by Multi-Threading Optimization? Use the properties below to limit the availability of CPU resources for model inference. The OpenVINO Runtime will perform multi-threading scheduling based on the limited available CPU if the platform or operating system supports this behavior.

ov::inference_num_threads
ov::hint::scheduling_core_type
ov::hint::enable_hyper_threading

Please take a look at Multi Threading Optimization for more information.

Regards,

Aznie

Aznie_Intel · ‎09-13-2024

Hi Rajiv Mandal,

This thread will no longer be monitored since we have provided information. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie