# MLX LM

> A Python library for running and fine-tuning large language models on Apple Silicon using the MLX framework.

MLX LM is an open-source Python library developed by Apple's ML Explore team that enables developers to run, fine-tune, and deploy large language models (LLMs) efficiently on Apple Silicon devices. Built on top of the MLX framework, it provides optimized performance for M-series chips, making it an essential tool for developers working with AI on macOS. The library supports a wide range of models from Hugging Face and offers both a Python API and command-line interface for flexibility.

- **Local LLM Inference** allows users to run large language models directly on Apple Silicon without requiring cloud services or external GPUs, leveraging the unified memory architecture of M1, M2, and M3 chips for efficient processing.

- **Model Fine-tuning** provides capabilities to fine-tune pre-trained models using techniques like LoRA (Low-Rank Adaptation), enabling customization of models for specific use cases with reduced computational requirements.

- **Quantization Support** offers tools to quantize models to lower precision formats (4-bit, 8-bit), significantly reducing memory footprint while maintaining model quality for deployment on devices with limited resources.

- **Hugging Face Integration** seamlessly works with models from the Hugging Face Hub, allowing users to easily download and run popular open-source models like Llama, Mistral, and Phi directly.

- **Text Generation API** provides a simple Python interface for generating text completions, supporting streaming output, temperature control, and other generation parameters for building AI-powered applications.

- **Command-Line Tools** include utilities for model conversion, quantization, and text generation, making it easy to experiment with different models and configurations without writing code.

To get started, install the library via pip with `pip install mlx-lm`. You can then generate text using the command line with `mlx_lm.generate --model mlx-community/Llama-3-8B-Instruct-4bit --prompt "Hello"` or use the Python API to integrate LLM capabilities into your applications. The library requires macOS with Apple Silicon and supports Python 3.8 and above.

## Features
- Local LLM inference on Apple Silicon
- Model fine-tuning with LoRA
- 4-bit and 8-bit quantization
- Hugging Face model integration
- Text generation API
- Command-line interface
- Model conversion tools
- Streaming text generation
- Chat template support
- Memory-efficient inference

## Integrations
Hugging Face Hub, MLX Framework, Transformers

## Platforms
MACOS, WEB, API, DEVELOPER_SDK

## Pricing
Open Source

## Version
0.19.0

## Links
- Website: https://github.com/ml-explore/mlx-lm
- Documentation: https://github.com/ml-explore/mlx-lm
- Repository: https://github.com/ml-explore/mlx-lm
- EveryDev.ai: https://www.everydev.ai/tools/mlx-lm