OpenMythos

Name: OpenMythos
Availability: OnlineOnly
Author: Kye Gomez

An open-source, theoretical PyTorch implementation of a Recurrent-Depth Transformer (RDT) inspired by the suspected Claude Mythos architecture, featuring MoE, MLA/GQA attention, and LTI-stable looped inference.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. Free to use, modify, and distribute.

Engagement

Available On

Linux

API

SDK

CLI

Kye GomezKye Gomez builds open-source AI research tools and framework…

Listed Apr 2026

About OpenMythos

OpenMythos is an open-source, community-driven theoretical reconstruction of the Claude Mythos model architecture, built from publicly available research. It implements a Recurrent-Depth Transformer (RDT) with three stages: a Prelude of standard transformer blocks, a looped Recurrent Block run up to max_loop_iters times, and a final Coda. The project is not affiliated with or endorsed by Anthropic and is intended purely for research and exploration of compute-adaptive, depth-variable reasoning.

Recurrent-Depth Transformer (RDT) — implements a Prelude → Looped Recurrent Block → Coda architecture where the same weights are reused across loop iterations for implicit multi-hop reasoning in continuous latent space
Switchable Attention — supports both Multi-head Latent Attention (MLA) and Grouped Query Attention (GQA), configurable via attn_type in MythosConfig
Sparse Mixture of Experts (MoE) — feed-forward layers use fine-grained routed experts plus always-on shared experts, enabling broad domain coverage with low per-token activation cost
LTI-Stable Injection — injection parameters are constrained so the spectral radius ρ(A) < 1 by construction, preventing residual explosion and enabling stable training at high learning rates
Pre-configured Model Variants — factory functions (mythos_1b through mythos_1t) provide ready-to-use MythosConfig objects spanning 1B to 1T parameters
Training Script Included — a 3B model training script on FineWeb-Edu is provided, supporting single-GPU and multi-GPU (DDP via torchrun) setups with bfloat16/float16 precision
Adaptive Computation Time (ACT) — architecture supports variable loop depth per input, allowing harder inputs to receive more compute while simpler ones halt early
LoRA Depth Adaptation — optional depth-wise LoRA modules allow each loop iteration to adapt behavior slightly while preserving the compactness of weight sharing
Install via pip — get started with pip install open-mythos, then import OpenMythos and MythosConfig from open_mythos.main to instantiate and run the model

Community Discussions

Be the first to start a conversation about OpenMythos

Share your experience with OpenMythos, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (MIT)

Fully free and open-source under the MIT License. Free to use, modify, and distribute.

Full source code access
All model variants (1B–1T)
Training scripts included
MIT License — commercial use allowed

Capabilities

Key Features

Recurrent-Depth Transformer (RDT) architecture
Switchable MLA and GQA attention
Sparse Mixture of Experts (MoE) with routed and shared experts
LTI-stable injection parameters (spectral radius < 1)
Pre-configured model variants from 1B to 1T parameters
Training script for 3B model on FineWeb-Edu
Single-GPU and multi-GPU (DDP) training support
Adaptive Computation Time (ACT) halting mechanism
Depth-wise LoRA adaptation per loop iteration
bfloat16/float16 mixed precision training
Continuous Depth-wise Batching for variable inference compute
Loop-index positional embedding support

Integrations

PyTorch

HuggingFace Datasets (FineWeb-Edu)

torchrun (PyTorch DDP)

openai/gpt-oss-20b tokenizer

API Available

View Docs

Back to all tools Suggest an edit

About OpenMythos

Recurrent-Depth Transformer (RDT) — implements a Prelude → Looped Recurrent Block → Coda architecture where the same weights are reused across loop iterations for implicit multi-hop reasoning in continuous latent space
Switchable Attention — supports both Multi-head Latent Attention (MLA) and Grouped Query Attention (GQA), configurable via attn_type in MythosConfig
Sparse Mixture of Experts (MoE) — feed-forward layers use fine-grained routed experts plus always-on shared experts, enabling broad domain coverage with low per-token activation cost
LTI-Stable Injection — injection parameters are constrained so the spectral radius ρ(A) < 1 by construction, preventing residual explosion and enabling stable training at high learning rates
Pre-configured Model Variants — factory functions (mythos_1b through mythos_1t) provide ready-to-use MythosConfig objects spanning 1B to 1T parameters
Training Script Included — a 3B model training script on FineWeb-Edu is provided, supporting single-GPU and multi-GPU (DDP via torchrun) setups with bfloat16/float16 precision
Adaptive Computation Time (ACT) — architecture supports variable loop depth per input, allowing harder inputs to receive more compute while simpler ones halt early
LoRA Depth Adaptation — optional depth-wise LoRA modules allow each loop iteration to adapt behavior slightly while preserving the compactness of weight sharing
Install via pip — get started with pip install open-mythos, then import OpenMythos and MythosConfig from open_mythos.main to instantiate and run the model

OpenMythos