Accelerate AI Workloads with a PyTorch* 2.4 Workshop on the Intel® Tiber™ Developer Cloud

Sonya_Wach · ‎08-23-2024

What is Intel Tiber Developer Cloud?

The Intel® Tiber™ Developer Cloud is a cloud-based platform that empowers developers, AI/ML researchers, ecosystem partners, AI startups, and enterprise customers by providing access to cutting-edge Intel hardware and software solutions so that AI and High-Performance Computing applications can be built, tested, run, and optimized at a low cost and overhead. The Intel Tiber Developer Cloud provides developers an easy path to innovate with small or large workloads on Intel CPUs, GPUs, and the AI PC, including access to AI-optimized software such as oneAPI.

For developers and enterprise customers looking to test the capabilities of the platform and hardware and learn what Intel enables them to do, an option to use complimentary shared environments and Jupyter notebooks exists.

This post will walk you through a workshop that provides hands-on experience with PyTorch 2.4 on Intel GPUs. You will learn how to leverage XPUs for accelerated performance and understand how the Intel Tiber Developer Cloud can help you develop and deploy generative AI workloads.

PyTorch 2.4

PyTorch is a popular open-source deep learning framework for building and training deep neural networks often used in computer vision and natural language processing (NLP) applications. PyTorch 2.4 provides initial support for Intel® Data Center GPU Max Series to help further accelerate AI workloads on Intel hardware, taking advantage of the SYCL* software stack and the Unified Acceleration Foundation* (UXL) multivendor software ecosystem.

With this support, you can now run and deploy workloads on Intel GPUs with minimal coding efforts, easing the deployment of PyTorch on ubiquitous hardware. This also makes integrating different hardware backends easier, allowing even more opportunity for workload development and deployment.

For GPU support and better performance, you can also utilize the Intel® Extension for PyTorch, which optimizes deep learning training and inference performance on Intel processors for various applications, including large language models (LLMs) and Generative AI (GenAI).

PyTorch 2.4 on Intel GPUs Workshop

In this training, you will learn how to use PyTorch 2.4 with Intel GPUs, leveraging the capabilities of PyTorch while exercising examples on tensor operations, running example workloads, and optimizing models for enhanced performance.

To get started, head to cloud.intel.com and Sign Up with an email or Sign In if you already have an account.

Once on the platform, use the Menu icon on the top left to navigate to the Training page.

The Training page showcases several JupyterLab workshops you can try on the Intel Tiber Developer Cloud, including trainings in AI, AI with Intel Gaudi 2 Accelerators, C++ SYCL, Gen AI, and the Rendering Toolkit.

In this tutorial, we will navigate to the Gen AI Essentials training and examine the PyTorch 2.4 on Intel GPUs workshop. Click Learn More or Launch to access the Jupyter notebook.

Once your Jupyter notebook training has launched, ensure the Kernel in the top right is set to PyTorch 2.4.

screenshot 3.png

Run the cells and follow the steps in the notebook to see a few example workloads that showcase the power of PyTorch on Intel GPUs. Below is a further explanation of the steps outlined in the training notebook. Note: the cells outlined in this document are just for reference and are missing key lines needed to run effectively; consult the Jupyter notebook for the full code.

Step 1: Checking PyTorch Version and Device

print(f"PyTorch Version: {torch.__version__}")
device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f"Using device: {device}")

This step checks which PyTorch version is being used, in this case, PyTorch 2.4, and whether an XPU or CPU is being used on the Intel Tiber Developer Cloud. The output should show the XPU device running on the platform.

Step 2: Basic Tensor Operation

mat1 = torch.randn(3, 4, device=device)
mat2 = torch.randn(4, 5, device=device)
result = torch.matmul(mat1, mat2)

This step creates a tensor on the XPU device, and a simple matrix multiplication is performed. The resulting shape of the tensor operation is printed. You can perform tensor operations with PyTorch that can run on GPUs and other AI accelerators and effectively encode a model's inputs, outputs, and parameters.

Step 3: Example Workload – Image Classification with FP32 Precision

# Get model
weights = ResNet18_Weights.DEFAULT
model = models.resnet18(weights=weights)
model = model.to(device)
imagenet_classes = weights.meta["categories"]

# Prepare the input image
image_url = https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_
November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg
input_image = Image.open(requests.get(image_url, stream=True).raw)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0).to(device)

# infer
model = model.eval()
with torch.no_grad():
    output = model(input_batch)

_, predicted = torch.max(output, 1)
class_index = predicted.item()
class_label = imagenet_classes[class_index]

Here, we get a sample ResNet18 model, prepare a sample input image from Wikipedia*, and infer the model. The ResNet18 model is used for deep residual learning for image recognition and is pre-trained on the ImageNet dataset. The code output shows the predicted Class label generated by the model. If the ResNet18 model works correctly and does not require further fine-tuning, the Class label should result in ‘Egyptian cat’.

Step 4: Example Workload – Sentiment Analysis Inference

# Define a simple sentiment analysis model (expected to have a trained model, 
for now we will use this model as an example)
class SentimentModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.embedding(x)
        _, (hidden, _) = self.lstm(x)
        out = self.fc(hidden.squeeze(0))
        return out

vocab_size = 10000
embed_dim = 100
hidden_dim = 512
output_dim = 2

model = SentimentModel(vocab_size, embed_dim, hidden_dim, output_dim)
model = model.to(device)
print(f"\nModel before compilation: \n{model}\n")
model = torch.compile(model)  # compile model
print("-"*72)
print(f"\nModel after compilation: \n{model}")

In this example, a Long-Short-Term Memory (LSTM) model defines a simple sentiment analysis model. An LSTM model is a Recurrent Neural Network (RNN) that can process sequential data and retain its hidden state. The model parameters are shown before and after compilation with torch.compile and a resulting Sentiment score.

Step 5: Transfer Learning – Vision Using Auto Mixed Precision

# Evaluate
model = model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(f"Accuracy on test images: {100 * correct / total:.2f}%")

This step showcases transfer learning with a ResNet18 vision model using torch.float16 or torch.bfloat16 to reduce storage requirements and increase the model speed for auto-mixed precision. First, a CIFAR10 dataset consisting of many images is used to create train and test sets for the ResNet18 model. The model is trained and evaluated on the accuracy of test images. After the training completes all iterations, an accuracy percentage is printed. In this case, we see a high accuracy that can be further increased with appropriate fine-tuning.

After running through the steps in the notebook, you will have successfully been exposed to the capabilities of PyTorch 2.4 on Intel GPUs, including basic tensor operations, device checking, and training and inferring several example AI models.

Check out Intel Tiber Developer Cloud to access the latest silicon hardware and optimized software to help develop and power your next innovative AI projects! We encourage you to check out Intel’s AI Tools and Framework optimizations and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio. Also, discover how our other collaborations with industry-leading independent software vendors (ISV), system integrators (SI), original equipment manufacturers (OEM), and enterprise users accelerate AI adoption.

Accelerate AI Workloads with a PyTorch* 2.4 Workshop on the Intel® Tiber™ Developer Cloud

What is Intel Tiber Developer Cloud?

PyTorch 2.4

PyTorch 2.4 on Intel GPUs Workshop

Additional resources