# Qwen3

> Qwen3 is a family of open-weight large language models by Alibaba Cloud's Qwen team, featuring both dense and Mixture-of-Experts architectures with seamless thinking and non-thinking modes.

Qwen3 is a series of open-weight large language models developed by the Qwen team at Alibaba Cloud, available in dense and Mixture-of-Experts (MoE) variants ranging from 0.6B to 235B parameters. The models support seamless switching between a thinking mode (for complex reasoning, math, and coding) and a non-thinking mode (for efficient general-purpose chat). Qwen3 supports 100+ languages and dialects and achieves state-of-the-art performance among open-weight models on reasoning, coding, and agent benchmarks. The latest Qwen3-2507 update extends long-context understanding to 256K tokens, extendable to 1 million tokens.

- **Dense and MoE model sizes**: Available in 0.6B, 1.7B, 4B, 8B, 14B, 32B (dense) and 30B-A3B, 235B-A22B (MoE) to fit a wide range of hardware budgets.
- **Thinking and non-thinking modes**: *Switch between deep reasoning mode and fast chat mode* using `enable_thinking` flags or `/think`/`/no_think` instructions in the prompt.
- **Long-context support**: *Handles up to 256K tokens natively*, extendable to 1 million tokens with updated Qwen3-2507 model variants.
- **Multilingual capability**: *Supports 100+ languages and dialects* with strong multilingual instruction following and translation.
- **Agent and tool use**: *Integrates with Qwen-Agent for tool use and MCP support*, enabling precise function calling in both thinking and non-thinking modes.
- **Broad inference framework support**: *Run with Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, and MNN* for flexible local and cloud deployment.
- **Finetuning support**: *Compatible with Axolotl, UnSloth, Swift, and LLaMA-Factory* for SFT, DPO, and GRPO training workflows.
- **Quantization**: *Supports GPTQ, AWQ, and GGUF quantization* for efficient deployment on consumer hardware.
- **Apache 2.0 license**: All open-weight models are freely available for commercial and research use.

## Features
- Dense and MoE model architectures
- Thinking and non-thinking mode switching
- 256K token long-context support (extendable to 1M)
- 100+ language and dialect support
- Agent and tool use with MCP support
- Supports vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio
- GPTQ, AWQ, and GGUF quantization
- Finetuning with Axolotl, UnSloth, Swift, LLaMA-Factory
- OpenAI-compatible API server
- Apache 2.0 open-weight license

## Integrations
Hugging Face Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, MNN, ModelScope, Qwen-Agent, Axolotl, LLaMA-Factory, UnSloth, Swift

## Platforms
MACOS, LINUX, ANDROID, IOS, API, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
Qwen3-2507

## Links
- Website: https://github.com/QwenLM/Qwen3
- Documentation: https://qwen.readthedocs.io/
- Repository: https://github.com/QwenLM/Qwen3
- EveryDev.ai: https://www.everydev.ai/tools/qwen3