EveryDev.ai
Sign inSubscribe
Explore AI Tools
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    Home
    Tools

    2,480+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1596
    • Coding1181
    • Infrastructure526
    • Marketing447
    • Design427
    • Projects384
    • Research357
    • Analytics331
    • Testing221
    • MCP216
    • Data205
    • Security196
    • Integration169
    • Learning154
    • Communication146
    • Prompts140
    • Extensions137
    • Commerce123
    • Voice122
    • DevOps99
    • Web77
    • Finance21
    1. Home
    2. Tools
    3. Marin
    Marin icon

    Marin

    AI Development Libraries

    An open-source framework for researching and developing foundation models, with full reproducibility of every step from raw data to final model.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under Apache License 2.0. Free to use, modify, and distribute.

    Engagement

    Available On

    Windows
    Web
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    AI Development LibrariesModel ManagementAcademic Research

    Alternatives

    flash-moePyTorchAxolotl
    Developer
    marin-communityStanford University, CAEst. 2025

    Listed May 2026

    About Marin

    Marin is an open-source framework built by the marin-community organization for the research and development of foundation models. It operates as an open lab where every step of the model-building process—data curation, training, evaluation, and even failed experiments—is recorded and shared publicly in real time. The project is licensed under Apache 2.0 and hosted on GitHub, with documentation available on ReadTheDocs.

    What It Is

    Marin is a Python-based framework designed to make foundation model research fully reproducible and transparent. Rather than sharing only final model weights, Marin captures the entire provenance graph: raw data sources, tokenization pipelines, training configurations, hyperparameter choices, and evaluation results. It targets researchers and practitioners who want to train language models like Llama, DeepSeek, or Qwen-style architectures from scratch, and who want every decision to be auditable and replicable.

    How the Experiment Workflow Works

    Marin structures research as a directed acyclic graph of steps, similar to a Makefile, where each step can depend on prior steps and is executed in topological order. The lifecycle of an experiment follows a defined pattern:

    • A GitHub issue is created to preregister the experiment with hypotheses and goals.
    • A pull request is submitted with code that reproduces the experiment.
    • The code defines a provenance graph that is executed, with results summarized in a WandB report.

    This means every experiment—including those that failed—is traceable through a GitHub issue, a PR, executable code, and a WandB run. Example experiments tracked this way include comparisons of z-loss impact, optimizer sweeps (AdamW vs. alternatives), BERT vs. fastText as quality filters, and MoE vs. dense model efficiency.

    Models Trained with Marin

    The marin-community has used the framework to train and release several models:

    • Marin-8B-Base: The project claims this was the first open-source 8B parameter model to outperform Llama 3.1 8B, beating it on 14 out of 19 standard benchmarks.
    • Marin-8B-Instruct: A fine-tuned instruction-following variant available to try on Together AI.
    • Marin-32B-Base: The project states this beats OLMo 2 32B Base on 14/19 standard benchmarks and is competitive with Gemma 3 27B PT and Qwen 2.5 32B Base.

    All training scripts, execution graphs, and WandB reports for these models are publicly linked from the project homepage.

    Core Capabilities

    Marin covers the full pipeline for language model development:

    • Data curation: filtering, transformation, and quality scoring of raw datasets
    • Tokenization: configurable tokenization pipelines (e.g., Llama 3 tokenizer)
    • Training: supports TPU pods (including multislice TPU) and GPU multi-node setups
    • Evaluation: integrates with EleutherAI's lm-evaluation-harness for in-loop eval during training
    • Speedrun competition: a community benchmark inspired by the nanogpt speedrun, where participants compete to train models to a target quality within a compute budget

    Current Status and Community

    As of May 2026, the repository shows active development with 983 stars, 116 forks, and 578 open issues. The project acknowledges support from the Google TPU Research Cloud program. Community participation happens via Discord and a mailing list, and the project explicitly invites contributions across architecture experiments, training algorithms, datasets, and evaluations. Agent skill guides (e.g., for adding new datasets) are included in the repository under .agents/skills/.

    Marin - 1

    Community Discussions

    Be the first to start a conversation about Marin

    Share your experience with Marin, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under Apache License 2.0. Free to use, modify, and distribute.

    • Full framework source code
    • Data curation and tokenization pipelines
    • Language model training on TPU and GPU
    • In-loop evaluation with lm-evaluation-harness
    • WandB integration

    Capabilities

    Key Features

    • Full reproducibility of every training step
    • Provenance graph execution (DAG-based, like a Makefile)
    • Data curation, filtering, transformation, and tokenization pipelines
    • Language model training on TPU pods and multi-node GPUs
    • In-loop evaluation with lm-evaluation-harness
    • WandB integration for experiment reporting
    • GitHub issue-based experiment preregistration
    • Speedrun competition for efficient training methods
    • Perplexity Gap Dashboard for analysis
    • Agent skill guides for common tasks

    Integrations

    WandB (Weights & Biases)
    EleutherAI lm-evaluation-harness
    Hugging Face Datasets
    Google TPU Research Cloud
    Together AI
    GitHub
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Marin and help others make informed decisions.

    Developer

    marin-community

    marin-community builds Marin, an open-source framework for foundation model research and development. The project operates as an open lab, sharing every step of the model-building process—code, data, experiments, and failures—in real time. Marin has produced models including Marin-8B and Marin-32B, with training runs documented end-to-end via GitHub issues, pull requests, and WandB reports. The project is supported by the Google TPU Research Cloud program and welcomes community contributions across architectures, datasets, and training algorithms.

    Founded 2025
    Stanford University
    50 employees

    Used by

    AI Community
    Academic Researchers
    Read more about marin-community
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    flash-moe icon

    flash-moe

    A Mixture of Experts (MoE) implementation in Python, enabling efficient sparse model inference by routing inputs to specialized expert sub-networks.

    PyTorch icon

    PyTorch

    An open-source machine learning framework for deep learning research and production with GPU acceleration and distributed training support.

    Axolotl icon

    Axolotl

    Open-source tool for fine-tuning LLMs faster and at scale, supporting multi-GPU training, LoRA, FSDP, and a wide range of model architectures.

    Browse all tools

    Related Topics

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    189 tools

    Model Management

    Tools for managing, versioning, and deploying AI models.

    37 tools

    Academic Research

    AI tools designed specifically for academic and scientific research.

    42 tools
    Browse all topics
    Back to all tools
    Discussions