# Supertonic

> Lightning-fast, on-device text-to-speech system powered by ONNX Runtime that runs entirely locally with no cloud dependency, supporting 31 languages across Python, JavaScript, mobile, and native runtimes.

Supertonic is an open-source, on-device text-to-speech (TTS) system developed by Supertone Inc. It is powered by ONNX Runtime and runs entirely on-device—no cloud calls, no API keys, no privacy concerns. The project is available on GitHub under the MIT License (sample code) and OpenRAIL-M License (models), and provides ready-to-use inference examples across more than ten programming languages and platforms.

## What It Is

Supertonic is a lightweight, local-inference TTS engine designed to generate natural-sounding speech from text on consumer hardware, edge devices, and browsers. Unlike cloud-based TTS services, it downloads ONNX model assets from Hugging Face on first run and then operates entirely offline. The system targets developers who need fast, private, and portable speech synthesis without depending on external APIs.

## Update: Supertonic 3

The most recent major release, Supertonic 3 (published 2026-04-29), expands language support from 5 languages (v2) to 31 languages, reduces repeat and skip reading failures, improves speaker similarity, and ships v2-compatible public ONNX assets so existing integrations can upgrade without changing their inference contract. The model weighs approximately 99M parameters across its public ONNX assets—substantially smaller than 0.7B–2B class open TTS systems. Earlier milestones include Supertonic 2 (2026-01-06), the `supertonic` PyPI package (2025-12-10), and Flutter/macOS SDK support (2025-11-24). A companion Voice Builder service (launched 2026-01-22) lets users convert their own voice into a deployable, edge-native TTS model.

## Architecture and Runtime Footprint

Supertonic uses a flow-matching based text-to-latent module and a speech autoencoder, as described in the accompanying arXiv paper (arXiv:2503.23108). Key technical properties include:

- **Runtime**: ONNX Runtime for cross-platform CPU/GPU inference
- **Browser support**: `onnxruntime-web` for WebGPU/WASM client-side inference
- **Audio output**: 16-bit WAV files
- **Batch processing**: Supports batch inference for higher throughput
- **Expressive tags**: Inline tags such as `<laugh>`, `<breath>`, and `<sigh>`

The project's own benchmarks show Supertonic 3 running fast on CPU even compared with larger baselines measured on A100 GPU, and using substantially less memory. On an Onyx Boox Go 6 e-reader in airplane mode, the project reports an average real-time factor of 0.3×.

## Language and Platform Coverage

Supertonic 3 supports 31 languages including English, Korean, Japanese, Arabic, German, French, Spanish, Hindi, Russian, and more. Runtime examples are provided for Python, Node.js, browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, Rust, and Flutter. The Python SDK is installable via `pip install supertonic` and auto-downloads model assets from Hugging Face on first run.

## Reading Accuracy and Text Normalization

The project's README highlights text normalization as a differentiator. Supertonic handles complex real-world inputs—decimal currency expressions (e.g., "$5.2M"), phone numbers with area codes and extensions, and technical units (e.g., "2.3h", "30kph")—without requiring pre-processing or phonetic annotations. The project publishes audio comparison samples against other TTS systems for these categories.

## Ecosystem and Adoption Signal

Several third-party projects have integrated Supertonic, as listed in the repository: the TLDRL Chrome extension for on-device webpage reading, the open-source Read Aloud browser extension (Chrome and Edge), the PageEcho iOS e-book reader app, a VoiceChat browser-based LLM chatbot, OmniAvatar talking avatar generator, CopiloTTS Kotlin Multiplatform SDK, and Hugging Face's Transformers.js library (via a merged pull request). Pinokio also provides a one-click localhost installer for Mac, Windows, and Linux.

## Features
- On-device inference with no cloud dependency
- 31-language support (Supertonic 3)
- ONNX Runtime-based cross-platform inference
- Browser support via onnxruntime-web (WebGPU/WASM)
- Batch inference for improved throughput
- Expressive tags: <laugh>, <breath>, <sigh>
- Auto-download of model assets from Hugging Face
- 16-bit WAV audio output
- Text normalization for currency, phone numbers, and technical units
- Voice Builder for custom edge-native TTS voices
- ~99M parameter model size for lightweight deployment
- Python PyPI package (pip install supertonic)

## Integrations
Python, Node.js, Browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS (Xcode), Rust, Flutter, Hugging Face Hub, ONNX Runtime, Transformers.js, Pinokio

## Platforms
WINDOWS, MACOS, LINUX, IOS, WEB, API, BROWSER_EXTENSION, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
3.0

## Links
- Website: https://huggingface.co/spaces/Supertone/supertonic-2
- Documentation: https://supertone-inc.github.io/supertonic-py
- Repository: https://github.com/supertone-inc/supertonic
- EveryDev.ai: https://www.everydev.ai/tools/supertonic
