OpenMythos
An open-source, theoretical PyTorch implementation of a Recurrent-Depth Transformer (RDT) inspired by the suspected Claude Mythos architecture, featuring MoE, MLA/GQA attention, and LTI-stable looped inference.
At a Glance
Fully free and open-source under the MIT License. Free to use, modify, and distribute.
Engagement
Available On
Alternatives
Listed Apr 2026
About OpenMythos
OpenMythos is an open-source, community-driven theoretical reconstruction of the Claude Mythos model architecture, built from publicly available research. It implements a Recurrent-Depth Transformer (RDT) with three stages: a Prelude of standard transformer blocks, a looped Recurrent Block run up to max_loop_iters times, and a final Coda. The project is not affiliated with or endorsed by Anthropic and is intended purely for research and exploration of compute-adaptive, depth-variable reasoning.
- Recurrent-Depth Transformer (RDT) — implements a Prelude → Looped Recurrent Block → Coda architecture where the same weights are reused across loop iterations for implicit multi-hop reasoning in continuous latent space
- Switchable Attention — supports both Multi-head Latent Attention (MLA) and Grouped Query Attention (GQA), configurable via
attn_typeinMythosConfig - Sparse Mixture of Experts (MoE) — feed-forward layers use fine-grained routed experts plus always-on shared experts, enabling broad domain coverage with low per-token activation cost
- LTI-Stable Injection — injection parameters are constrained so the spectral radius ρ(A) < 1 by construction, preventing residual explosion and enabling stable training at high learning rates
- Pre-configured Model Variants — factory functions (
mythos_1bthroughmythos_1t) provide ready-to-useMythosConfigobjects spanning 1B to 1T parameters - Training Script Included — a 3B model training script on FineWeb-Edu is provided, supporting single-GPU and multi-GPU (DDP via
torchrun) setups with bfloat16/float16 precision - Adaptive Computation Time (ACT) — architecture supports variable loop depth per input, allowing harder inputs to receive more compute while simpler ones halt early
- LoRA Depth Adaptation — optional depth-wise LoRA modules allow each loop iteration to adapt behavior slightly while preserving the compactness of weight sharing
- Install via pip — get started with
pip install open-mythos, then importOpenMythosandMythosConfigfromopen_mythos.mainto instantiate and run the model
Community Discussions
Be the first to start a conversation about OpenMythos
Share your experience with OpenMythos, ask questions, or help others learn from your insights.
Pricing
Open Source (MIT)
Fully free and open-source under the MIT License. Free to use, modify, and distribute.
- Full source code access
- All model variants (1B–1T)
- Training scripts included
- MIT License — commercial use allowed
Capabilities
Key Features
- Recurrent-Depth Transformer (RDT) architecture
- Switchable MLA and GQA attention
- Sparse Mixture of Experts (MoE) with routed and shared experts
- LTI-stable injection parameters (spectral radius < 1)
- Pre-configured model variants from 1B to 1T parameters
- Training script for 3B model on FineWeb-Edu
- Single-GPU and multi-GPU (DDP) training support
- Adaptive Computation Time (ACT) halting mechanism
- Depth-wise LoRA adaptation per loop iteration
- bfloat16/float16 mixed precision training
- Continuous Depth-wise Batching for variable inference compute
- Loop-index positional embedding support
