Building High-Performance Image Search with OpenCLIP, Chroma, and Intel® Max GPUs

Jade-Worrall · ‎04-28-2025

Authors: Sri Raj Aryan Karumuri , Sr Solutions Engineer, Intel Liftoff and Rahul Unnikrishnan Nair, Head of Engineering, Intel Liftoff

Forget theoretical specs. Intel® Liftoff mentors and AI engineers hammered Intel® Data Center GPU Max 1100 and Intel® Tiber™ AI Cloud and turned the findings into a field guide for startups chasing lean, high-throughput LLM pipelines.

Before we dive deep in, it's important to set the stage. All the benchmarking, testing, and development work described in this post were performed using the Intel® Tiber™ AI Cloud.

Intel® Tiber™ AI Cloud is a managed cloud platform specifically designed to provide developers and AI startups with scalable and cost-effective access to Intel's advanced AI hardware portfolio. This includes Intel® Gaudi® 2 (and Gaudi® 3) accelerators, Intel® Data Center GPU Max Series, and the latest Intel® Xeon® Scalable processors. For startups focused on building and deploying compute-intensive AI models, Intel® Tiber™ AI Cloud removes the significant barrier of upfront hardware investment while providing an environment optimized for performance.

If you represent an AI startup interested in exploring the capabilities of Intel® Data Center GPU Max, Intel® Gaudi accelerators and leveraging the optimized environment of Intel® Tiber™ AI Cloud for your own projects, we encourage you to connect with the Intel® Liftoff for AI Startups program.

This program is designed to support startups like yours with resources, technical expertise, and access to platforms like Intel® Tiber™ AI Cloud.

AI-driven applications increasingly rely on multimodal data - images, text, and audio. This article demonstrates how to create and query a multimodal database that stores both images and text using Chroma and OpenCLIP embeddings.
These embeddings enable efficient comparison and retrieval of data across different modalities. The goal of the project is to build a system capable of handling image data and querying it through text-based search queries, all while leveraging GPU or XPU acceleration for enhanced performance.

The Intel® Data Center GPU Max 1100: Powering Advanced AI

The performance described in this article, particularly the acceleration achieved using Intel Extension for PyTorch (IPEX) , is enabled by powerful hardware like the Intel® Data Center GPU Max Series. The GPU (Max 1100 ) is available in the free Intel® Tiber™ AI Cloud JupyterLab environment and as dedicated instances:

Compute Architecture (Xe-HPC):
- Xe-cores: 56 dedicated cores forming the foundation for GPU compute tasks.
- Intel® Xe Matrix Extensions (XMX) Engines: 448 engines providing deep systolic arrays optimized for accelerating the dense matrix and vector operations prevalent in AI and deep learning models.
- Vector Engines: 448 engines complementing the XMX units for broader parallel processing tasks.
- Ray Tracing Units: 56 units for hardware-accelerated ray tracing, enhancing visualization capabilities.
Memory Hierarchy:
- High Bandwidth Memory (HBM2e): 48 GB of HBM2e memory delivers 1.23 TB/s of bandwidth, crucial for large datasets and complex models like those used in multimodal embeddings.
- Cache: Features 28 MB L1 and 108 MB L2 cache to keep data close to the compute units, minimizing latency.
Connectivity:
- PCIe Gen 5: Utilizes a fast PCIe Gen 5 x16 host interface for high-speed data transfer between the CPU and GPU.
Software Ecosystem (oneAPI): The Intel® Data Center Max Series GPUs are designed to work seamlessly with the Intel oneAPI open, standards-based programming model. This allows developers to use frameworks like HuggingFace Transformers, Pytorch , Intel Extension for Pytorch etc and other libraries optimized for Intel architectures (CPUs and GPUs) to accelerate AI pipelines without proprietary lock-in.

What is this Code About?

This code walks through the steps of setting up a multimodal database using Chroma as the vector database to store image and text embeddings. It then allows querying the database with a text query to find similar images or metadata. The code also demonstrates how to leverage Intel's hardware acceleration for PyTorch using Intel Extension for PyTorch (IPEX) to speed up computations on Intel devices, such as CPUs or XPU (Accelerated Processing Unit).

The main components of this code are:

Embedding Images and Texts: It utilizes OpenCLIP (a CLIP-based model) to generate embeddings for images and text, which are then stored in a database for easy retrieval. We chose OpenCLIP for its strong performance on various benchmarks and readily available pre-trained models.
Chroma Database: Chroma is used to create a persistent database where the embeddings are stored, allowing efficient retrieval of the most similar results based on a text query. ChromaDB was selected for its Python-native API, ease of setting up persistent multimodal collections, and focus on developer experience.
Hardware Acceleration: The code checks whether an XPU is available for hardware acceleration. The addition of Intel’s hardware acceleration via IPEX, optimizing tasks like embedding generation, ensures faster data processing, making this setup ideal for high-performance applications.

Use Cases and Applications
This code is useful in any scenario where you need to:

Store multimodal data: You may have images, text, or both, and you need a fast, scalable way to store and retrieve them.
Image Search: The ability to query images based on textual descriptions (e.g., searching for “Black colour Benz” to retrieve similar car images) can be used in e-commerce platforms, image search engines, and recommendation systems.
Cross-modal Retrieval: When you need to retrieve one modality (images) based on another modality (text), such as using text to find similar images or vice versa. This is commonly seen in systems like visual question answering or caption-based image search.
Recommendation Systems: Similarity-based queries can be used to recommend products, movies, or other content that is semantically similar to a user’s query.
AI-based Applications: This is ideal for scenarios in machine learning pipelines, such as training data creation, feature extraction, or data preprocessing for multimodal models.

Requirements:

torch for deep learning operations.
intel_extension_for_pytorch (IPEX) for optimized PyTorch performance.
chromaDB for creating and querying a persistent multimodal vector database.
matplotlib for displaying images.
OpenCLIPEmbeddingFunction and ImageLoader from chromadb.utils for embedding extraction and image loading.

Code Breakdown

Step 1: Import necessary libraries, Set Up Directories and Device Selection
The code begins by importing the necessary python packages, defining directories for storing images and a persistent vector database.

`get_device()`: This function checks if XPU (representing Intel discrete GPUs like the Intel® Data Center GPU Max series available on Intel® Tiber™AI Cloud) is available and selects it as the device, otherwise defaults to CPU. Leveraging XPU enables faster computations when available, especially beneficial for intensive workloads.

# Core Libs

import os

import shutil

from pathlib import Path



# Torch with optimizations

import torch

import intel_extension_for_pytorch as ipex # For IPEX acceleration



# Vector DB and Embeddings

import chromadb

from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction

from chromadb.utils.data_loaders import ImageLoader



import matplotlib.pyplot as plt



root_directory = Path.cwd()

image_directory = root_directory / "Images"

persistent_directory = root_directory / "my_vectordb"



def get_device() -> torch.device:

    """Check and return the appropriate device (XPU or CPU)."""

    device_type = "xpu" if torch.xpu.is_available() else "cpu"

    device = torch.device(device_type)

   

    if device.type == "xpu":

        torch.xpu.empty_cache()

        print(f"Using device: {torch.xpu.get_device_name()}")

    else:

        print("Using CPU") 

    return device

Step 2: Initialize the Chroma Database
The function initialize_chroma() initializes a Chroma client to interact with the vector database. If a directory for the database exists, it is removed and recreated.

def initialize_chroma() -> chromadb.PersistentClient:

    """Initialize Chroma client and return the client object."""

    if persistent_directory.exists():

        print(f"Removing existing database directory: {persistent_directory}")

        shutil.rmtree(persistent_directory)

    persistent_directory.mkdir(parents=True, exist_ok=True)

    print(f"Created persistent directory: {persistent_directory}")



    return chromadb.PersistentClient(path=str(persistent_directory))

Step 3: Setting Up the Database Collection
Next, the code sets up a multimodal database in Chroma, where both image and text embeddings can be stored and queried.

def initialize_db(chroma_client: chromadb.PersistentClient) -> chromadb.Collection:

    """Create and return a multimodal database collection."""

    image_loader = ImageLoader()

    multimodal_embedding_function = OpenCLIPEmbeddingFunction()



    return chroma_client.get_or_create_collection(

        name="multimodal_database",

        embedding_function=multimodal_embedding_function,

        data_loader=image_loader

    )

ImageLoader() is responsible for loading image data from the file system.
OpenCLIPEmbeddingFunction() generates embeddings for both images and text, allowing for similarity-based queries.

Step 4: Add Images to the Database
The add_images_to_db() function processes images from a predefined directory and adds them to the Chroma database. The images are indexed by unique IDs.

def add_images_to_db(multimodal_db: chromadb.Collection):

    """Add sample images to the multimodal database."""

    image_extensions = ('.jpg', '.jpeg', '.png')

   

    image_uris = [str(Path(image_directory) / image) for image in os.listdir(image_directory) if image.lower().endswith(image_extensions)]

   

    print(f"Found image URIs: {image_uris}")

   

    if not image_uris:

        print("No valid images found.")

        return

   

    image_ids = [str(i) for i in range(1, len(image_uris) + 1)]



    print(f"Adding {len(image_uris)} images to the database.")

    result = multimodal_db.add(ids=image_ids, uris=image_uris)

    print(f"Add result: {result}")

Step 5: Query the Database
The function query_db() allows you to query the database using text. It retrieves the top N results based on similarity to the input text query.

def query_db(multimodal_db: chromadb.Collection, query_texts: list, category_filter: str = None) -> dict:

    """Query the multimodal database and return the results."""

    filters = {'img_category': category_filter} if category_filter else None

   

    query_results = multimodal_db.query(

        query_texts=query_texts, 

        n_results=2,

        include=['distances', 'data', 'uris'], 

        where=filters

    )   

    return query_results

Step 6: Displaying Query Results
After querying the database, the print_query_results() function displays the results, including the image corresponding to the closest match.

def print_query_results(query_list: list, query_results: dict) -> None:

    """Print the results of the query."""

    result_count = len(query_results['ids'][0]) if query_results['ids'] else 0 

   

    for i, query in enumerate(query_list):

        print(f"Results for query: {query}")

       

        for j in range(result_count):

            result = query_results['ids'][i][j] if query_results['ids'] else None

            distance = query_results['distances'][i][j] if query_results['distances'] else None

            uri = query_results['uris'][i][j] if query_results['uris'] else None

            data = query_results['data'][i][j] if query_results['data'] else None

           

            print(f"id: {result}, distance: {distance}")

            print(f"URI: {uri}")



            if data is not None and len(data) > 0: 

                plt.imshow(data)

                plt.axis("off") 

                plt.show()

            else:

                print(f"No image data for result: {result}")

Step 7: Main Workflow
Finally, the main() function coordinates the execution of the program. It initializes the device, Chroma client, and multimodal database, adds images, and performs a sample query.

def main():

    """Main function to manage the workflow."""

   

    device = get_device() 

    chroma_client = initialize_chroma() 

    multimodal_db = initialize_db(chroma_client) 

    add_images_to_db(multimodal_db) 

   

    query_texts = ['Black colour Benz'] 

    query_results = query_db(multimodal_db, query_texts) 

    print_query_results(query_texts, query_results) 

   

    print("END")



if __name__ == "__main__":

    main()

Output:

Query Asked: Black colour Benz

Here is the output we got:

Complete Runnable Code

Below is the complete Python script discussed in this article. It demonstrates how to set up a persistent multimodal database using ChromaDB, generate embeddings for local images using OpenCLIP, and perform text-based image searches, leveraging IPEX for acceleration on Intel hardware like the Intel® Data Center GPU Max 1100 available on Intel® Tiber™ AI Cloud.

Prerequisites:

Ensure you have the required libraries installed (torch, intel-extension-for-pytorch, chromadb, matplotlib, open_clip_torch). You can typically install them using pip:

```

pip list torch intel-extension-for-pytorch chromadb matplotlib open_clip_torch Pillow

# if not available install (ensure ipex for xpu is used)

# pip install torch intel-extension-for-pytorch chromadb matplotlib open_clip_torch Pillow

```

(Note: Adjust intel-extension-for-pytorch installation based on your specific hardware/environment if not using Intel® Tiber™AI Cloud).
Create a subdirectory named Images in the same directory where you save this script.
Place your image files (.jpg, .jpeg, or .png ) into the Images subdirectory.
You can copy the entire code block below, save it as a Python file (e.g. multimodal_search.py), and run it to see the multimodal search in action.

# Core Libs

import os

import shutil

from pathlib import Path



# Torch with optimizations

import torch

import intel_extension_for_pytorch as ipex # For IPEX acceleration



# Vector DB and Embeddings

import chromadb

from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction

from chromadb.utils.data_loaders import ImageLoader



import matplotlib.pyplot as plt



root_directory = Path.cwd()

image_directory = root_directory / "Images"  # Assumes images are in an "Images" subdirectory

persistent_directory = root_directory / "my_vectordb"



def get_device() -> torch.device:

    """Check and return the appropriate device (XPU or CPU)."""

    # Check for Intel discrete GPU (XPU) support, otherwise default to CPU

    device_type = "xpu" if torch.xpu.is_available() else "cpu"

    device = torch.device(device_type)



    if device.type == "xpu":

        # Ensure GPU memory is clear if using XPU

        torch.xpu.empty_cache()

        print(f"Using device: {torch.xpu.get_device_name()}")

    else:

        print("Using CPU")

    return device



def initialize_chroma() -> chromadb.PersistentClient:

    """Initialize Chroma client and return the client object."""

    if persistent_directory.exists():

        print(f"Removing existing database directory: {persistent_directory}")

        shutil.rmtree(persistent_directory)

    persistent_directory.mkdir(parents=True, exist_ok=True)

    print(f"Created persistent directory: {persistent_directory}")



    return chromadb.PersistentClient(path=str(persistent_directory))



def initialize_db(chroma_client: chromadb.PersistentClient) -> chromadb.Collection:

    """Create and return a multimodal database collection."""

    image_loader = ImageLoader()

    multimodal_embedding_function = OpenCLIPEmbeddingFunction()

    return chroma_client.get_or_create_collection(

        name="multimodal_database",embedding_function=multimodal_embedding_function,

        data_loader=image_loader

    )



def add_images_to_db(multimodal_db: chromadb.Collection):

    """Add sample images from the specified directory to the multimodal database."""

    image_extensions = ('.jpg', '.jpeg', '.png')



    image_uris = [

        str(p) for p in image_directory.iterdir()

        if p.is_file() and p.suffix.lower() in image_extensions

    ]

    print(f"Found image URIs: {image_uris}")

    if not image_uris:

        print(f"No valid images found in directory: {image_directory}")

        print("Please ensure the 'Images' subdirectory exists and contains .jpg, .jpeg, or .png files.")

        return



    # Generate simple numeric IDs for the images (starting from 0)

    image_ids = [str(i) for i in range(len(image_uris))]

    print(f"Adding {len(image_uris)} images to the database.")

    # Add images using their URIs; embeddings and data are handled by Chroma helpers

    result = multimodal_db.add(ids=image_ids, uris=image_uris)

    # Chroma returns None on successful add

    print(f"Add result: {result}")



def query_db(multimodal_db: chromadb.Collection, query_texts: list, n_results: int = 2, category_filter: str = None) -> dict:

    """Query the multimodal database with text and return the top N results."""

    filters = {'img_category': category_filter} if category_filter else None

    query_results = multimodal_db.query(

        query_texts=query_texts,

        n_results=n_results,

        include=['distances', 'data', 'uris'],

        where=filters

    )

    return query_results



def print_query_results(query_list: list, query_results: dict) -> None:

    """Print the results of the query and display images."""

    # Check if results were returned and are structured as expected

    if not query_results or not query_results.get('ids') or not query_results['ids'] or not query_results['ids'][0]:

        print("No results found for the query.")

        return



    for i, query in enumerate(query_list):

        print(f"\n--- Results for query: '{query}' ---")

        # Ensure results exist for this specific query text index

        if i >= len(query_results['ids']) or not query_results['ids'][i]:

            print("  No results found for this specific query text.")

            continue

        num_matches = len(query_results['ids'][i])

        for j in range(num_matches):

            # Safely access results for this match index

            result_id = query_results['ids'][i][j]

            distance = query_results['distances'][i][j] if query_results.get('distances') and query_results['distances'][i] else 'N/A'

            uri = query_results['uris'][i][j] if query_results.get('uris') and query_results['uris'][i] else 'N/A'

            data = query_results['data'][i][j] if query_results.get('data') and query_results['data'][i] else None



            print(f"\n  Match {j+1}:")

            print(f"  ID: {result_id}")

            # Format distance if it's a number

            if isinstance(distance, (int, float)):

                 print(f"  Distance: {distance:.4f}")

            else:

                 print(f"  Distance: {distance}")

            print(f"  URI: {uri}")



            # Display the image using matplotlib

            if data is not None:

                try:

                    plt.imshow(data)

                    plt.title(f"Query: '{query}'\nMatch {j+1} (ID: {result_id})")

                    plt.axis("off")

                    plt.show()

                except Exception as e:

                    print(f"  (Error displaying image for result ID: {result_id}: {e})")

            else:

                print(f"  (No image data loaded for result ID: {result_id})")



def main():

    """Main function to manage the workflow."""

    # Get the appropriate compute device (XPU or CPU)

    device = get_device()

    # Initialize ChromaDB client and the multimodal collection

    chroma_client = initialize_chroma()

    multimodal_db = initialize_db(chroma_client)

    # Add images from the specified directory

    add_images_to_db(multimodal_db)

    # Define text queries

    query_texts = ['Black colour Benz'] # Example query

    # Query the database

    # Check if multimodal_db was created successfully before querying

    if multimodal_db:

        print(f"\nQuerying database with texts: {query_texts}")

        query_results = query_db(multimodal_db, query_texts, n_results=2)



        # Print and display results

        print_query_results(query_texts, query_results)

    else:

        print("Database collection not initialized. Skipping query.")

    print("\nEND")



if __name__ == "__main__":

    # Ensure the 'Images' directory exists before running main

    if not image_directory.is_dir():

        print(f"Error: Image directory not found at '{image_directory}'")

        print("Please create the 'Images' subdirectory and add image files.")

    else:

        main()

Note on Production Code:
While this example prioritizes clarity for demonstration, production deployments involving filesystem operations (like removing or creating directories in initialize_chroma) should always include robust error handling using try...except blocks.
This ensures your application can gracefully handle potential issues such as permission errors or other filesystem exceptions, preventing unexpected crashes and improving reliability.

Conclusion

This script demonstrates the creation of a multimodal database with Chroma and OpenCLIP, allowing for efficient storage and querying of images and text. By combining the power of PyTorch, Intel Extensions for PyTorch, and Chroma, this workflow can be easily adapted to a variety of use cases such as image search, recommendation systems, and cross-modal data retrieval.
The addition of Intel’s hardware acceleration ensures faster data processing, making this setup ideal for high-performance applications.

Try it Yourself on Intel® Tiber™ AI Cloud:
You can run this code and explore the performance of the Intel® Data Center GPU Max 1100 directly. Intel® Tiber™ AI Cloud offers:

Free JupyterLab Environment: Get hands-on access to a Max 1100 GPU for training and experimentation by creating an account at cloud.intel.com and launching a GPU-accelerated notebook from the "Training" section.
Virtual Machines & Bare Metal: Access single Max 1100 GPU VMs (starting at $0.39/hr/card) or powerful multi-GPU systems connected via high speed bridges. PoC credits are available for qualifying AI startups via the Intel® Liftoff program. Find more details on the Intel® Tiber™ AI Cloud Pricing Page.

Related resources

Intel® Tiber™ AI Cloud - Cloud platform for AI development and deployment

Intel® Data Center GPU Max Series - High-performance GPUs tailored for intense data center applications, designed to accelerate AI and HPC workload