# OpenMythos

> An open-source, theoretical PyTorch implementation of a Recurrent-Depth Transformer (RDT) inspired by the suspected Claude Mythos architecture, featuring MoE, MLA/GQA attention, and LTI-stable looped inference.

OpenMythos is an open-source, community-driven theoretical reconstruction of the Claude Mythos model architecture, built from publicly available research. It implements a Recurrent-Depth Transformer (RDT) with three stages: a **Prelude** of standard transformer blocks, a looped **Recurrent Block** run up to `max_loop_iters` times, and a final **Coda**. The project is not affiliated with or endorsed by Anthropic and is intended purely for research and exploration of compute-adaptive, depth-variable reasoning.

- **Recurrent-Depth Transformer (RDT)** — *implements a Prelude → Looped Recurrent Block → Coda architecture where the same weights are reused across loop iterations for implicit multi-hop reasoning in continuous latent space*
- **Switchable Attention** — *supports both Multi-head Latent Attention (MLA) and Grouped Query Attention (GQA), configurable via `attn_type` in `MythosConfig`*
- **Sparse Mixture of Experts (MoE)** — *feed-forward layers use fine-grained routed experts plus always-on shared experts, enabling broad domain coverage with low per-token activation cost*
- **LTI-Stable Injection** — *injection parameters are constrained so the spectral radius ρ(A) < 1 by construction, preventing residual explosion and enabling stable training at high learning rates*
- **Pre-configured Model Variants** — *factory functions (`mythos_1b` through `mythos_1t`) provide ready-to-use `MythosConfig` objects spanning 1B to 1T parameters*
- **Training Script Included** — *a 3B model training script on FineWeb-Edu is provided, supporting single-GPU and multi-GPU (DDP via `torchrun`) setups with bfloat16/float16 precision*
- **Adaptive Computation Time (ACT)** — *architecture supports variable loop depth per input, allowing harder inputs to receive more compute while simpler ones halt early*
- **LoRA Depth Adaptation** — *optional depth-wise LoRA modules allow each loop iteration to adapt behavior slightly while preserving the compactness of weight sharing*
- **Install via pip** — *get started with `pip install open-mythos`, then import `OpenMythos` and `MythosConfig` from `open_mythos.main` to instantiate and run the model*

## Features
- Recurrent-Depth Transformer (RDT) architecture
- Switchable MLA and GQA attention
- Sparse Mixture of Experts (MoE) with routed and shared experts
- LTI-stable injection parameters (spectral radius < 1)
- Pre-configured model variants from 1B to 1T parameters
- Training script for 3B model on FineWeb-Edu
- Single-GPU and multi-GPU (DDP) training support
- Adaptive Computation Time (ACT) halting mechanism
- Depth-wise LoRA adaptation per loop iteration
- bfloat16/float16 mixed precision training
- Continuous Depth-wise Batching for variable inference compute
- Loop-index positional embedding support

## Integrations
PyTorch, HuggingFace Datasets (FineWeb-Edu), torchrun (PyTorch DDP), openai/gpt-oss-20b tokenizer

## Platforms
LINUX, API, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Links
- Website: https://github.com/kyegomez/OpenMythos
- Documentation: https://github.com/kyegomez/OpenMythos/blob/main/docs/open_mythos.md
- Repository: https://github.com/kyegomez/OpenMythos
- EveryDev.ai: https://www.everydev.ai/tools/openmythos