How to Fine-Tune an LLM on Intel® GPUs With Unsloth

Stephanie_Maluso · ‎02-27-2026

Fine-tune popular AI models faster with Unsloth on Intel® Core™ Ultra AI PCSs, Intel® Arc™ GPUs, and Intel® Data Center GPU series to build personalized assistants for studying, work, creative projects, and more. Plus, leverage Intel-optimized open models like Llama 3 and Qwen for best performance.

Modern workflows showcase the endless possibilities of generative and agentic AI. Examples include tuning and customizing a chatbot to handle product-support questions or building a personal assistant to manage one’s daily schedule. However, a challenge remains in getting a small language model to respond consistently with high accuracy for specialized agentic tasks.

That's where fine-tuning comes in.

Unsloth, one of the world's most widely used open-source frameworks for fine-tuning LLMs, now supports Intel® GPUs. It provides an approachable way to customize models with efficient, low-memory training—from Intel® Core™ Ultra AI PCSs and Intel® Arc™ discrete GPUs in consumer desktops and laptops to Intel® Data Center GPU series for enterprise workloads.

Fine-Tuning Techniques

Fine-tuning is like giving an AI model a focused training session. With examples tied to a specific topic or workflow, the model improves its accuracy by learning new patterns and adapting to the task at hand.

Choosing a fine-tuning method depends on how much of the original model you want to adjust. Based on your goals, you can combine several fine-tuning methods:

Supervised Fine-Tuning (SFT)

Training a pre-trained model on a high-quality dataset of curated input-output examples. The goal of SFT is to teach the model-specific skills, domain knowledge, or formatting, making it proficient at tasks such as code generation, summarization, or domain-specific Q&A. SFT can be implemented with different parameter update strategies, including full fine-tuning and parameter-efficient fine-tuning.

Full Fine-Tuning:

- How it works: Updates all of the model's parameters—useful for teaching the model to follow specific formats or styles.

- Requirements: Large dataset (1,000+ prompt-sample pairs). Full fine-tuning is supported on Intel® Core™ Ultra AI PCSs, Intel® Arc™ GPUs, and Intel® Data Center GPU series. Intel® Arc™ GPUs and Intel® Data Center GPUs are recommended for larger models.

Parameter-Efficient Fine-Tuning (PEFT):

- How it works: Updates only a small set of additional "delta" weights rather than the entire model, enabling faster and lower-cost training. On Intel® GPUs, this approach is highly effective due to PyTorch optimizations and supported low-precision data types.

- Requirements: Small- to medium-sized dataset (100–1,000 prompt-sample pairs). Supported on Intel® Core™ Ultra AI PCSs, Intel® Arc™ GPUs, and Intel® Data Center GPU series.

Reinforcement Learning from Human Feedback (RLHF)

A multi-step process where humans or LLM rank different model outputs for the same prompt (e.g., "Response A is better than B"), a reward model learns these preferences, and then the LLM is trained via reinforcement learning to maximize these rewards.

- How it works: Adjusts model behavior using feedback or preference signals. The model learns by interacting with its environment and uses feedback to improve over time. This advanced technique interweaves training and inference and can be used in tandem with PEFT or full fine-tuning. For inference, Intel® also supports Architectures like vLLM and SGLang.

- Target use case: Improving model accuracy in specialized domains—such as law or medicine—or building autonomous agents that orchestrate actions on a user's behalf.

- Requirements: A pipeline containing an actor model, a reward model, and an environment for the model to learn from. Supported on Intel® Core™ Ultra AI PCSs, Intel® Arc™ GPUs, and Intel® Data Center GPU series. Intel® Arc™ GPUs and Intel® Data Center GPUs are recommended for larger models.

Unsloth—A Fast Path to Fine-Tuning on Intel® GPUs

LLM fine-tuning is a memory- and compute-intensive workload involving billions of matrix multiplications to update model weights at every training step. This type of highly parallel workload requires the power of Intel® GPUs—leveraging Intel® Xe Core™s and XMX (Xe Matrix eXtensions) engines—to complete the process quickly and efficiently.

Unsloth excels at this workload by translating complex mathematical operations into efficient, optimized GPU kernels that take full advantage of Intel's Xe Architecture.

Unsloth delivers significant performance improvements on Intel® GPUs. These Intel-specific optimizations—powered by oneDNN for deep learning primitives and Triton for kernel fusion—combined with Unsloth's ease of use, make fine-tuning accessible to a broader community of AI enthusiasts and developers.

The framework now supports Intel® hardware—from Intel® Core™ Ultra AI PCSs and Intel® Arc™ series discrete GPUs in consumer laptops and desktops to Intel® Data Center GPU Max series for enterprise workloads—providing excellent performance while reducing memory consumption through efficient BF16 and FP16 mixed-precision training, with NF4 support for QLoRA.

Unsloth provides helpful guides, https://unsloth.ai/docs/get-started/install/intel, on getting started and managing different LLM configurations, hyperparameters, and options, along with example notebooks and step-by-step workflows. Developers can use these guides as a practical starting point to quickly run their first fine-tuning workflow on Intel GPUs and then iterate with confidence on their own models and datasets.

Conclusion

With Unsloth and Intel® Xe GPU Architecture, developers can now leverage the power of Intel® GPUs to fine-tune LLMs efficiently and cost-effectively. From Intel® Core™ Ultra AI PCSs and Intel® Arc™ discrete GPUs in consumer systems to Intel® Data Center GPU series for enterprise deployments, the Intel® AI ecosystem provides a complete, open-standards-based solution for the entire fine-tuning workflow.

Key Takeaways

Accessibility: Intel® Xe GPU Architecture brings LLM fine-tuning capabilities to a broader audience with competitive price-to-performance ratios.

Efficiency: XMX engines accelerate matrix operations critical for training, while mixed-precision training reduces memory footprint.

Full Hardware Coverage: Intel® provides complete Unsloth support across its entire GPU portfolio—from Intel® Core™ Ultra Based AI PCSs and consumer Arc™ discrete GPUs to enterprise-grade Data Center GPUs—making LLM fine-tuning accessible to everyone.

Complete Toolchain: From oneDNN for optimized deep learning primitives to oneCCL for distributed training and Triton support, Intel® provides end-to-end support—all integrated within Unsloth.