Qwen3

Name: Qwen3
Availability: OnlineOnly
Author: Alibaba Group

Qwen3 is a family of open-weight large language models by Alibaba Cloud's Qwen team, featuring both dense and Mixture-of-Experts architectures with seamless thinking and non-thinking modes.

Visit Website

At a Glance

Pricing

Open Source

All Qwen3 open-weight models are free to download and use under the Apache 2.0 license.

Engagement

Available On

macOS

Linux

Android

iOS

API

Alibaba GroupNo. 969 West Wen Yi Road, HangzhouEst. 1999$105M+ raised

Listed Apr 2026

About Qwen3

Qwen3 is a series of open-weight large language models developed by the Qwen team at Alibaba Cloud, available in dense and Mixture-of-Experts (MoE) variants ranging from 0.6B to 235B parameters. The models support seamless switching between a thinking mode (for complex reasoning, math, and coding) and a non-thinking mode (for efficient general-purpose chat). Qwen3 supports 100+ languages and dialects and achieves state-of-the-art performance among open-weight models on reasoning, coding, and agent benchmarks. The latest Qwen3-2507 update extends long-context understanding to 256K tokens, extendable to 1 million tokens.

Dense and MoE model sizes: Available in 0.6B, 1.7B, 4B, 8B, 14B, 32B (dense) and 30B-A3B, 235B-A22B (MoE) to fit a wide range of hardware budgets.
Thinking and non-thinking modes: Switch between deep reasoning mode and fast chat mode using enable_thinking flags or /think//no_think instructions in the prompt.
Long-context support: Handles up to 256K tokens natively, extendable to 1 million tokens with updated Qwen3-2507 model variants.
Multilingual capability: Supports 100+ languages and dialects with strong multilingual instruction following and translation.
Agent and tool use: Integrates with Qwen-Agent for tool use and MCP support, enabling precise function calling in both thinking and non-thinking modes.
Broad inference framework support: Run with Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, and MNN for flexible local and cloud deployment.
Finetuning support: Compatible with Axolotl, UnSloth, Swift, and LLaMA-Factory for SFT, DPO, and GRPO training workflows.
Quantization: Supports GPTQ, AWQ, and GGUF quantization for efficient deployment on consumer hardware.
Apache 2.0 license: All open-weight models are freely available for commercial and research use.

Community Discussions

Be the first to start a conversation about Qwen3

Share your experience with Qwen3, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

All Qwen3 open-weight models are free to download and use under the Apache 2.0 license.

All model sizes (0.6B to 235B)
Dense and MoE architectures
Thinking and non-thinking modes
Apache 2.0 license
Commercial use allowed

Capabilities

Key Features

Dense and MoE model architectures
Thinking and non-thinking mode switching
256K token long-context support (extendable to 1M)
100+ language and dialect support
Agent and tool use with MCP support
Supports vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio
GPTQ, AWQ, and GGUF quantization
Finetuning with Axolotl, UnSloth, Swift, LLaMA-Factory
OpenAI-compatible API server
Apache 2.0 open-weight license

Integrations

Hugging Face Transformers

vLLM

SGLang

TensorRT-LLM

llama.cpp

Ollama

LM Studio

MLX LM

OpenVINO

ExecuTorch

MNN

ModelScope

Qwen-Agent

Axolotl

LLaMA-Factory

UnSloth

Swift

API Available

View Docs

Back to all tools

Qwen3

Local Inference

Qwen3 is a family of open-weight large language models by Alibaba Cloud's Qwen team, featuring both dense and Mixture-of-Experts architectures with seamless thinking and non-thinking modes.

Visit Website

At a Glance

Pricing

Open Source

All Qwen3 open-weight models are free to download and use under the Apache 2.0 license.

Engagement

6views

Discussions

Available On

macOS

Linux

Android

iOS

API

Resources

Website Docs GitHub llms.txt

Topics

Local Inference AI Development Libraries LLM Orchestration

Alternatives

OpenMythos BitNet Arcee AI

Developer

Alibaba GroupNo. 969 West Wen Yi Road, HangzhouEst. 1999$105M+ raised

Listed Apr 2026

About Qwen3

Dense and MoE model sizes: Available in 0.6B, 1.7B, 4B, 8B, 14B, 32B (dense) and 30B-A3B, 235B-A22B (MoE) to fit a wide range of hardware budgets.
Thinking and non-thinking modes: Switch between deep reasoning mode and fast chat mode using enable_thinking flags or /think//no_think instructions in the prompt.
Long-context support: Handles up to 256K tokens natively, extendable to 1 million tokens with updated Qwen3-2507 model variants.
Multilingual capability: Supports 100+ languages and dialects with strong multilingual instruction following and translation.
Agent and tool use: Integrates with Qwen-Agent for tool use and MCP support, enabling precise function calling in both thinking and non-thinking modes.
Broad inference framework support: Run with Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio, MLX LM, OpenVINO, ExecuTorch, and MNN for flexible local and cloud deployment.
Finetuning support: Compatible with Axolotl, UnSloth, Swift, and LLaMA-Factory for SFT, DPO, and GRPO training workflows.
Quantization: Supports GPTQ, AWQ, and GGUF quantization for efficient deployment on consumer hardware.
Apache 2.0 license: All open-weight models are freely available for commercial and research use.

Community Discussions

Be the first to start a conversation about Qwen3

Share your experience with Qwen3, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

All Qwen3 open-weight models are free to download and use under the Apache 2.0 license.

All model sizes (0.6B to 235B)
Dense and MoE architectures
Thinking and non-thinking modes
Apache 2.0 license
Commercial use allowed

Capabilities

Key Features

Dense and MoE model architectures
Thinking and non-thinking mode switching
256K token long-context support (extendable to 1M)
100+ language and dialect support
Agent and tool use with MCP support
Supports vLLM, SGLang, TensorRT-LLM, llama.cpp, Ollama, LM Studio
GPTQ, AWQ, and GGUF quantization
Finetuning with Axolotl, UnSloth, Swift, LLaMA-Factory
OpenAI-compatible API server
Apache 2.0 open-weight license

Integrations

Hugging Face Transformers

vLLM

SGLang

TensorRT-LLM

llama.cpp

Ollama

LM Studio

MLX LM

OpenVINO

ExecuTorch

MNN

ModelScope

Qwen-Agent

Axolotl

LLaMA-Factory

UnSloth

Swift

API Available

View Docs

Back to all tools