# Unsloth

> Fine-tune and train LLMs up to 30x faster with 90% less memory usage through optimized GPU kernels and handwritten math derivations.

Unsloth is an open-source library that dramatically accelerates LLM fine-tuning and training by manually deriving compute-heavy math steps and handwriting GPU kernels. It enables users to train custom models in 24 hours instead of 30 days, achieving up to 30x faster performance than Flash Attention 2 (FA2) while using 90% less memory. The tool supports a wide range of NVIDIA GPUs from Tesla T4 to H100, with portability to AMD and Intel GPUs.

- **Massive Speed Improvements** - Achieve 2x faster training on a single GPU with the free version, scaling up to 30x faster on multi-GPU systems compared to Flash Attention 2.

- **Significant Memory Reduction** - Use up to 90% less VRAM than traditional methods, enabling training of larger models on existing hardware without upgrades.

- **Broad Model Support** - Compatible with popular models including Mistral, Gemma, and LLama 1, 2, and 3, with support for 4-bit and 16-bit LoRA fine-tuning.

- **500K Context Fine-tuning** - Train models with extremely long context lengths up to 500K tokens for advanced use cases.

- **FP8 Reinforcement Learning** - Support for FP8 RL training including GRPO for efficient reinforcement learning workflows.

- **Docker Support** - Easy deployment with official Docker images for containerized training environments.

- **Multi-GPU and Multi-Node** - Enterprise tier supports up to 32x GPU scaling with multi-node support for large-scale training operations.

- **Accuracy Improvements** - Enterprise users can achieve up to 30% accuracy improvements alongside speed gains.

To get started, users can access the free open-source version on GitHub and run it on Google Colab or Kaggle Notebooks. The library integrates seamlessly with existing ML workflows and requires no hardware changes to achieve performance improvements. Documentation is available at docs.unsloth.ai with comprehensive guides and model resources on Hugging Face.

## Features
- 30x faster training than Flash Attention 2
- 90% less memory usage
- Support for Mistral, Gemma, LLama 1, 2, 3
- 4-bit and 16-bit LoRA support
- 500K context length fine-tuning
- FP8 reinforcement learning (GRPO)
- Docker image support
- Multi-GPU support
- Multi-node support (Enterprise)
- TTS, BERT, FFT support
- 5x faster inference (Enterprise)
- Full training support (Enterprise)

## Integrations
Google Colab, Kaggle Notebooks, Hugging Face, Docker, NVIDIA GPUs, AMD GPUs, Intel GPUs

## Platforms
LINUX, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://unsloth.ai
- Documentation: https://docs.unsloth.ai/
- Repository: https://github.com/unslothai/unsloth
- EveryDev.ai: https://www.everydev.ai/tools/unsloth