Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,320+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1228
    • Coding1045
    • Infrastructure455
    • Marketing414
    • Design374
    • Projects340
    • Analytics319
    • Research306
    • Testing200
    • Data171
    • Integration169
    • Security169
    • MCP164
    • Learning146
    • Communication131
    • Prompts122
    • Extensions120
    • Commerce116
    • Voice107
    • DevOps92
    • Web73
    • Finance19
    1. Home
    2. Tools
    3. Supertonic
    Supertonic icon

    Supertonic

    Voice Synthesis
    Featured

    Lightning-fast, on-device text-to-speech system powered by ONNX Runtime that runs entirely locally with no cloud dependency, supporting 31 languages across Python, JavaScript, mobile, and native runtimes.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under MIT License (sample code) and OpenRAIL-M License (models). Free to use, modify, and distribute.

    Engagement

    Available On

    Windows
    macOS
    Linux
    iOS
    Web

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Voice SynthesisLocal InferenceGenerative Media

    Alternatives

    Yeta AIVoisDia
    Developer
    Supertone Inc.Seoul, South KoreaEst. 2020$39.4M raised

    Listed May 2026

    About Supertonic

    Supertonic is an open-source, on-device text-to-speech (TTS) system developed by Supertone Inc. It is powered by ONNX Runtime and runs entirely on-device—no cloud calls, no API keys, no privacy concerns. The project is available on GitHub under the MIT License (sample code) and OpenRAIL-M License (models), and provides ready-to-use inference examples across more than ten programming languages and platforms.

    What It Is

    Supertonic is a lightweight, local-inference TTS engine designed to generate natural-sounding speech from text on consumer hardware, edge devices, and browsers. Unlike cloud-based TTS services, it downloads ONNX model assets from Hugging Face on first run and then operates entirely offline. The system targets developers who need fast, private, and portable speech synthesis without depending on external APIs.

    Update: Supertonic 3

    The most recent major release, Supertonic 3 (published 2026-04-29), expands language support from 5 languages (v2) to 31 languages, reduces repeat and skip reading failures, improves speaker similarity, and ships v2-compatible public ONNX assets so existing integrations can upgrade without changing their inference contract. The model weighs approximately 99M parameters across its public ONNX assets—substantially smaller than 0.7B–2B class open TTS systems. Earlier milestones include Supertonic 2 (2026-01-06), the supertonic PyPI package (2025-12-10), and Flutter/macOS SDK support (2025-11-24). A companion Voice Builder service (launched 2026-01-22) lets users convert their own voice into a deployable, edge-native TTS model.

    Architecture and Runtime Footprint

    Supertonic uses a flow-matching based text-to-latent module and a speech autoencoder, as described in the accompanying arXiv paper (arXiv:2503.23108). Key technical properties include:

    • Runtime: ONNX Runtime for cross-platform CPU/GPU inference
    • Browser support: onnxruntime-web for WebGPU/WASM client-side inference
    • Audio output: 16-bit WAV files
    • Batch processing: Supports batch inference for higher throughput
    • Expressive tags: Inline tags such as <laugh>, <breath>, and <sigh>

    The project's own benchmarks show Supertonic 3 running fast on CPU even compared with larger baselines measured on A100 GPU, and using substantially less memory. On an Onyx Boox Go 6 e-reader in airplane mode, the project reports an average real-time factor of 0.3×.

    Language and Platform Coverage

    Supertonic 3 supports 31 languages including English, Korean, Japanese, Arabic, German, French, Spanish, Hindi, Russian, and more. Runtime examples are provided for Python, Node.js, browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, Rust, and Flutter. The Python SDK is installable via pip install supertonic and auto-downloads model assets from Hugging Face on first run.

    Reading Accuracy and Text Normalization

    The project's README highlights text normalization as a differentiator. Supertonic handles complex real-world inputs—decimal currency expressions (e.g., "$5.2M"), phone numbers with area codes and extensions, and technical units (e.g., "2.3h", "30kph")—without requiring pre-processing or phonetic annotations. The project publishes audio comparison samples against other TTS systems for these categories.

    Ecosystem and Adoption Signal

    Several third-party projects have integrated Supertonic, as listed in the repository: the TLDRL Chrome extension for on-device webpage reading, the open-source Read Aloud browser extension (Chrome and Edge), the PageEcho iOS e-book reader app, a VoiceChat browser-based LLM chatbot, OmniAvatar talking avatar generator, CopiloTTS Kotlin Multiplatform SDK, and Hugging Face's Transformers.js library (via a merged pull request). Pinokio also provides a one-click localhost installer for Mac, Windows, and Linux.

    Supertonic - 1

    Community Discussions

    Be the first to start a conversation about Supertonic

    Share your experience with Supertonic, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under MIT License (sample code) and OpenRAIL-M License (models). Free to use, modify, and distribute.

    • 31-language TTS support
    • ONNX Runtime inference
    • Python, Node.js, browser, mobile, and native runtime examples
    • Auto-download of model assets from Hugging Face
    • Batch inference

    Capabilities

    Key Features

    • On-device inference with no cloud dependency
    • 31-language support (Supertonic 3)
    • ONNX Runtime-based cross-platform inference
    • Browser support via onnxruntime-web (WebGPU/WASM)
    • Batch inference for improved throughput
    • Expressive tags: <laugh>, <breath>, <sigh>
    • Auto-download of model assets from Hugging Face
    • 16-bit WAV audio output
    • Text normalization for currency, phone numbers, and technical units
    • Voice Builder for custom edge-native TTS voices
    • ~99M parameter model size for lightweight deployment
    • Python PyPI package (pip install supertonic)

    Integrations

    Python
    Node.js
    Browser (WebGPU/WASM)
    Java
    C++
    C#
    Go
    Swift
    iOS (Xcode)
    Rust
    Flutter
    Hugging Face Hub
    ONNX Runtime
    Transformers.js
    Pinokio
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Supertonic and help others make informed decisions.

    Developer

    Supertone Inc.

    Supertone Inc. builds advanced speech AI technology, including on-device TTS systems and voice cloning tools. The team publishes open-source inference engines like Supertonic and operates the Voice Builder platform for custom edge-native voice creation. Their research spans flow-matching architectures, speech autoencoders, and efficient ONNX-based deployment across mobile, browser, and embedded hardware.

    Founded 2020
    Seoul, South Korea
    $39.4M raised
    44 employees

    Used by

    Disney+ (Big Bet)
    HYBE (Midnatt)
    SYNDI8
    Read more about Supertone Inc.
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    Yeta AI icon

    Yeta AI

    AI-powered live dubbing tool that translates any YouTube video into your language in seconds — no uploads, no editing, just paste a link and watch.

    Vois icon

    Vois

    A local desktop AI voice studio for podcasts, audiobooks, and video content with 63+ voices, unlimited generation, voice cloning, timeline editing, and professional mastering — all offline.

    Dia icon

    Dia

    Dia is an open-source text-to-speech model by Nari Labs that generates realistic dialogue audio with multiple speakers, emotions, and non-verbal sounds from transcripts.

    Browse all tools

    Related Topics

    Voice Synthesis

    AI tools that generate human-like speech from text.

    25 tools

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    99 tools

    Generative Media

    AI platforms providing comprehensive generative capabilities across multiple media types including images, video, audio, and 3D content.

    79 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions