# Qwen3 > Qwen3 is a family of open-weight large language models by Alibaba Cloud's Qwen team, featuring both dense and Mixture-of-Experts architectures with seamless thinking and non-thinking modes. Qwen3 is a series of open-weight large language models developed by the Qwen team at Alibaba Cloud, available in dense and Mixture-of-Experts (MoE) variants ranging from 0.6B to 235B parameters. The models support seamless switching between a thinking mode (for complex reasoning, math, and coding) and a non-thinking mode (for efficient general-purpose chat). Qwen3 supports 100+ languages and dialects and achieves state-of-the-art performance among open-weight models on reasoning, coding, and agent benchmarks. The latest Qwen3-2507 update extends long-context understanding to 256K tokens, extendable to 1 million tokens. - **Dense and MoE model sizes**: Available in 0.6B, 1.7B, 4B, 8B, 14B, 32B (dense) and 30B-A3B, 235B-A22B (MoE) to fit a wide range of hardware budgets. - **Thinking and non-thinking modes**: *Switch between deep reasoning mode and fast chat mode* using `enable_thinking` flags or `/think`/`/no_think` instructions in the prompt. - **Long-context support**: *Handles up to 256K tokens natively*, extendable to 1 million tokens with updated Qwen3-2507 model variants. - **Multilingual capability**: *Supports 100+ languages and dialects* with strong multilingual instruction following and translation. - **Agent and tool use**: *Integrates with Qwen-Agent for tool use and MCP support*, enabling precise function calling in both thinking and non-thinking modes. - **Broad inference framework support**: *Run with Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, and MNN* for flexible local and cloud deployment. - **Finetuning support**: *Compatible with Axolotl, UnSloth, Swift, and LLaMA-Factory* for SFT, DPO, and GRPO training workflows. - **Quantization**: *Supports GPTQ, AWQ, and GGUF quantization* for efficient deployment on consumer hardware. - **Apache 2.0 license**: All open-weight models are freely available for commercial and research use. ## Features - Dense and MoE model architectures - Thinking and non-thinking mode switching - 256K token long-context support (extendable to 1M) - 100+ language and dialect support - Agent and tool use with MCP support - Supports vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio - GPTQ, AWQ, and GGUF quantization - Finetuning with Axolotl, UnSloth, Swift, LLaMA-Factory - OpenAI-compatible API server - Apache 2.0 open-weight license ## Integrations Hugging Face Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, MNN, ModelScope, Qwen-Agent, Axolotl, LLaMA-Factory, UnSloth, Swift ## Platforms MACOS, LINUX, ANDROID, IOS, API, DEVELOPER_SDK, CLI ## Pricing Open Source ## Version Qwen3-2507 ## Links - Website: https://github.com/QwenLM/Qwen3 - Documentation: https://qwen.readthedocs.io/ - Repository: https://github.com/QwenLM/Qwen3 - EveryDev.ai: https://www.everydev.ai/tools/qwen3