Llama

Name: Llama
Availability: OnlineOnly
Author: Meta AI

Meta's family of open-weight large language models, available for download, fine-tuning, and deployment across cloud, on-premise, and edge environments.

Visit Website

At a Glance

Pricing

Free tier available

Download Llama model weights for self-hosted deployment under the Llama Community License.

Llama API: Custom/contact

Engagement

Available On

Windows

API

SDK

CLI

Meta AIMenlo Park, CAEst. 2013$135B raised

Updated May 2026

About Llama

Llama is Meta's family of large language models, released under a bespoke community license that permits broad commercial use, fine-tuning, and redistribution. The models range from lightweight 1B-parameter variants designed for edge and mobile devices to the flagship Llama 4 Maverick, a natively multimodal mixture-of-experts model with a 10-million-token context window. Developers can download weights directly from llama.com, access them via the Llama API, or use them through hosting partners including Amazon Web Services, Microsoft Azure, Google Cloud, IBM Watsonx, Oracle Cloud, Snowflake, Databricks, Hugging Face, Groq, Cerebras, and SambaNova.

What It Is

Llama is a series of auto-regressive transformer language models built by Meta's AI research team. The core job is to serve as a foundation for developers and researchers who want to build, fine-tune, distill, or deploy AI applications without being locked into a proprietary API. Models are distributed as downloadable weights, meaning the inference environment—and the privacy of inputs and outputs—stays under the licensee's control. The FAQ explicitly states that Meta cannot access inputs or outputs once models are downloaded.

Model Families and Capabilities

The current lineup spans two major generations:

Llama 4 — Natively multimodal models using early fusion to jointly pre-train on text and vision tokens. Llama 4 Maverick (128-expert MoE) and Llama 4 Scout (16-expert MoE) both feature 10M-token context windows and support 12 languages for text-to-text tasks. Benchmark scores published on the site show Llama 4 Maverick at 80.5 on MMLU Pro, 69.8 on GPQA Diamond, and 94.4 on DocVQA.
Llama 3 — The open-weight generation covering Llama 3.1 (8B, 70B, 405B), Llama 3.2 (1B, 3B lightweight; 11B, 90B multimodal), and Llama 3.3 (70B multilingual). Llama 3.3 70B is positioned as a high-performance replacement for Llama 3.1 70B.

Deployment and Optimization Path

Llama models run on GPUs, CPUs (x86 and ARM), TPUs, NPUs, and AI accelerators. Smaller models target system-on-chip platforms found in PCs, mobile devices, and other edge hardware. The documentation covers:

Prompt engineering — Improving LLM performance through natural language techniques
Fine-tuning — Adapting pre-trained weights to specific use cases; examples are in the Llama Cookbook repository on GitHub
Quantization — Reducing computational and memory requirements
Distillation — Teaching a smaller model to match a larger model's performance
RAG — Reference implementations available in the developer documentation

Licensing and Legal Framework

Llama models are not released under an OSI-approved open-source license. They use a bespoke Llama Community License Agreement that allows broad commercial use and derivative model creation, with restrictions including an Acceptable Use Policy. Key points from the FAQ: outputs from Llama 3.1 and later can be used to train other AI models with proper attribution; products built on Llama must display "Built with Llama" prominently; and EU-based individuals and companies face additional restrictions on multimodal model usage under the Llama 3.2, 3.3, and 4 AUPs.

Safety Infrastructure

Meta publishes a suite of protection tools under the Llama Protections umbrella, including the Llama Defenders Program, which the site describes as enabling AI defenders to deploy generative AI responsibly. A Developer Use Guide accompanies each model release to help licensees navigate responsible deployment.

Update: Llama 4

The most recent major release is Llama 4, which introduces native multimodality via early fusion—a departure from the frozen, separate multimodal weights used in prior generations. The site describes this as "a step change in intelligence." Llama 4 Scout is designed for single H100 GPU efficiency, while Llama 4 Maverick targets memory, personalization, and multi-modal application use cases. The Llama API, which provides hosted access to these models, was in waitlist status at the time of the source capture.

Community Discussions

Be the first to start a conversation about Llama

Share your experience with Llama, ask questions, or help others learn from your insights.

Pricing

FREE

Model Download

Download Llama model weights for self-hosted deployment under the Llama Community License.

Access to Llama 4 Maverick and Scout
Access to Llama 3.1, 3.2, 3.3 model families
Fine-tuning and distillation permitted
Commercial use allowed under community license
Deploy on any hardware (GPU, CPU, TPU, edge)

Llama API

Hosted API access to Llama models with usage-based pricing.

Custom

contact sales

Hosted inference via Llama API
Access to latest Llama 4 models
No infrastructure management required
Usage-based token pricing

View official pricing

Capabilities

Key Features

Natively multimodal Llama 4 models with early fusion architecture
10-million-token context window (Llama 4 Maverick and Scout)
Downloadable model weights for self-hosted deployment
Llama API for hosted model access
Fine-tuning support with Llama Cookbook examples
Quantization for reduced memory and compute requirements
Distillation tooling to compress larger models
RAG reference implementations
Multilingual support (12 languages in Llama 4)
Prompt engineering guides
Vision capabilities for image and text reasoning
Edge-optimized lightweight models (1B, 3B)
Llama Protections safety toolkit
Llama Defenders Program for responsible AI deployment

Integrations

Amazon Web Services

Microsoft Azure

Google Cloud

IBM Watsonx

Oracle Cloud

Snowflake

Databricks

Hugging Face

Groq

Cerebras

SambaNova

Dell

API Available

View Docs

Back to all tools