Supertonic

Name: Supertonic
Availability: OnlineOnly
Author: Supertone Inc.

Lightning-fast, on-device text-to-speech system powered by ONNX Runtime that runs entirely locally with no cloud dependency, supporting 31 languages across Python, JavaScript, mobile, and native runtimes.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under MIT License (sample code) and OpenRAIL-M License (models). Free to use, modify, and distribute.

Engagement

Available On

Windows

macOS

Linux

iOS

Web

Supertone Inc.Seoul, South KoreaEst. 2020$39.4M raised

Listed May 2026

About Supertonic

Supertonic is an open-source, on-device text-to-speech (TTS) system developed by Supertone Inc. It is powered by ONNX Runtime and runs entirely on-device—no cloud calls, no API keys, no privacy concerns. The project is available on GitHub under the MIT License (sample code) and OpenRAIL-M License (models), and provides ready-to-use inference examples across more than ten programming languages and platforms.

What It Is

Supertonic is a lightweight, local-inference TTS engine designed to generate natural-sounding speech from text on consumer hardware, edge devices, and browsers. Unlike cloud-based TTS services, it downloads ONNX model assets from Hugging Face on first run and then operates entirely offline. The system targets developers who need fast, private, and portable speech synthesis without depending on external APIs.

Update: Supertonic 3

The most recent major release, Supertonic 3 (published 2026-04-29), expands language support from 5 languages (v2) to 31 languages, reduces repeat and skip reading failures, improves speaker similarity, and ships v2-compatible public ONNX assets so existing integrations can upgrade without changing their inference contract. The model weighs approximately 99M parameters across its public ONNX assets—substantially smaller than 0.7B–2B class open TTS systems. Earlier milestones include Supertonic 2 (2026-01-06), the supertonic PyPI package (2025-12-10), and Flutter/macOS SDK support (2025-11-24). A companion Voice Builder service (launched 2026-01-22) lets users convert their own voice into a deployable, edge-native TTS model.

Architecture and Runtime Footprint

Supertonic uses a flow-matching based text-to-latent module and a speech autoencoder, as described in the accompanying arXiv paper (arXiv:2503.23108). Key technical properties include:

Runtime: ONNX Runtime for cross-platform CPU/GPU inference
Browser support: onnxruntime-web for WebGPU/WASM client-side inference
Audio output: 16-bit WAV files
Batch processing: Supports batch inference for higher throughput
Expressive tags: Inline tags such as <laugh>, <breath>, and <sigh>

The project's own benchmarks show Supertonic 3 running fast on CPU even compared with larger baselines measured on A100 GPU, and using substantially less memory. On an Onyx Boox Go 6 e-reader in airplane mode, the project reports an average real-time factor of 0.3×.

Language and Platform Coverage

Supertonic 3 supports 31 languages including English, Korean, Japanese, Arabic, German, French, Spanish, Hindi, Russian, and more. Runtime examples are provided for Python, Node.js, browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, Rust, and Flutter. The Python SDK is installable via pip install supertonic and auto-downloads model assets from Hugging Face on first run.

Reading Accuracy and Text Normalization

The project's README highlights text normalization as a differentiator. Supertonic handles complex real-world inputs—decimal currency expressions (e.g., "$5.2M"), phone numbers with area codes and extensions, and technical units (e.g., "2.3h", "30kph")—without requiring pre-processing or phonetic annotations. The project publishes audio comparison samples against other TTS systems for these categories.

Ecosystem and Adoption Signal

Several third-party projects have integrated Supertonic, as listed in the repository: the TLDRL Chrome extension for on-device webpage reading, the open-source Read Aloud browser extension (Chrome and Edge), the PageEcho iOS e-book reader app, a VoiceChat browser-based LLM chatbot, OmniAvatar talking avatar generator, CopiloTTS Kotlin Multiplatform SDK, and Hugging Face's Transformers.js library (via a merged pull request). Pinokio also provides a one-click localhost installer for Mac, Windows, and Linux.

Community Discussions

Be the first to start a conversation about Supertonic

Share your experience with Supertonic, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under MIT License (sample code) and OpenRAIL-M License (models). Free to use, modify, and distribute.

31-language TTS support
ONNX Runtime inference
Python, Node.js, browser, mobile, and native runtime examples
Auto-download of model assets from Hugging Face
Batch inference

Capabilities

Key Features

On-device inference with no cloud dependency
31-language support (Supertonic 3)
ONNX Runtime-based cross-platform inference
Browser support via onnxruntime-web (WebGPU/WASM)
Batch inference for improved throughput
Expressive tags: <laugh>, <breath>, <sigh>
Auto-download of model assets from Hugging Face
16-bit WAV audio output
Text normalization for currency, phone numbers, and technical units
Voice Builder for custom edge-native TTS voices
~99M parameter model size for lightweight deployment
Python PyPI package (pip install supertonic)

Integrations

Python

Node.js

Browser (WebGPU/WASM)

Java

C++

Swift

iOS (Xcode)

Rust

Flutter

Hugging Face Hub

ONNX Runtime

Transformers.js

Pinokio

API Available

View Docs

Back to all tools Suggest an edit