AssemblyAI

Name: AssemblyAI
Availability: OnlineOnly
Author: AssemblyAI

Speech Recognition

Speech-to-text and speech understanding API platform for building Voice AI applications with industry-leading accuracy.

Visit Website

At a Glance

Pricing

Free tier available

Access to industry-leading Speech-to-Text and Audio Intelligence models

Pay as you go - Universal: $0 usage-based

Pay as you go - Slam-1: $0.27 usage-based

Enterprise: Custom/contact

Engagement

Available On

Web

API

SDK

AssemblyAISan Francisco, CAEst. 2017$115M raised

Listed Jan 2026

About AssemblyAI

AssemblyAI provides industry-leading speech-to-text and speech understanding models that power Voice AI applications for thousands of companies worldwide. The platform enables developers to transcribe audio and video files, process live streaming audio, and extract insights from voice data with exceptional accuracy and low latency. AssemblyAI processes over 40 terabytes of audio daily and serves 600M+ inference calls per month.

Speech-to-Text converts pre-recorded audio and video files into accurate transcripts with support for 99 languages, speaker diarization, automatic punctuation, and word-level timestamps.
Streaming Speech-to-Text enables real-time transcription with ultra-low latency for voice agents and live applications, featuring built-in turn detection and unlimited concurrency.
Speech Understanding provides audio intelligence capabilities including sentiment analysis, entity detection, topic detection, auto chapters, summarization, and speaker identification.
LLM Gateway unifies voice-to-intelligence workflows by applying Large Language Models directly to audio content through a single API.
Guardrails ensures content safety with profanity filtering, PII redaction from both text and audio, and content moderation for sensitive topics.
Enterprise Features include SOC 2 Type 2, ISO 27001, HIPAA compliance with BAA, EU data residency, self-hosted deployments, and dedicated support.

To get started, sign up for a free account with $50 in credits to access the API. Install the SDK for Python, JavaScript, or other supported languages, then submit audio files or connect streaming audio to receive transcripts. The playground allows testing models without code. Enterprise customers can contact sales for volume discounts, custom deployments, and dedicated infrastructure options.

Community Discussions

Be the first to start a conversation about AssemblyAI

Share your experience with AssemblyAI, ask questions, or help others learn from your insights.

Pricing

FREE

Free

Access to industry-leading Speech-to-Text and Audio Intelligence models

Transcribe up to 185 hours of pre-recorded audio for free
Transcribe up to 333 hours of streaming audio for free
Up to 5 new streams per minute
Developer docs, community support, and resources

Pay as you go - Universal

Fast, accurate transcription across 99 languages

usage based

Unlimited access to Speech-to-Text, Audio Intelligence, and LeMUR
Unlimited concurrent streams
Pre-recorded concurrency starting at 200 files
Customize rate limits
Dedicated technical support

Pay as you go - Slam-1

Highest accuracy transcription powered by LLM intelligence

$0.27

usage based

LLM-powered contextual understanding
English only
Beta access

Enterprise

Custom pricing for enterprise needs

Custom

contact sales

Custom rate limits and enhanced concurrency
Enterprise-grade flexibility
BAA for HIPAA compliance
EU Data Residency standards
Self-hosted deployments (On-prem, EU, VPC)
Customized SLAs and SLOs
24/7 support and dedicated solution architects

View official pricing

Capabilities

Key Features

Speech-to-Text for pre-recorded audio
Streaming Speech-to-Text for real-time transcription
Speaker diarization
99 language support
Automatic language detection
Sentiment analysis
Entity detection
Topic detection
Auto chapters
Summarization
Speaker identification
Translation
Custom formatting
PII redaction
PII audio redaction
Profanity filtering
Content moderation
LLM Gateway
Word-level timestamps
Keyterms prompting
Custom spelling

Integrations

AWS

Twilio

Cloudflare

Recall

LiveKit

API Available

View Docs

Back to all tools Suggest an edit

About AssemblyAI

Speech-to-Text converts pre-recorded audio and video files into accurate transcripts with support for 99 languages, speaker diarization, automatic punctuation, and word-level timestamps.
Streaming Speech-to-Text enables real-time transcription with ultra-low latency for voice agents and live applications, featuring built-in turn detection and unlimited concurrency.
Speech Understanding provides audio intelligence capabilities including sentiment analysis, entity detection, topic detection, auto chapters, summarization, and speaker identification.
LLM Gateway unifies voice-to-intelligence workflows by applying Large Language Models directly to audio content through a single API.
Guardrails ensures content safety with profanity filtering, PII redaction from both text and audio, and content moderation for sensitive topics.
Enterprise Features include SOC 2 Type 2, ISO 27001, HIPAA compliance with BAA, EU data residency, self-hosted deployments, and dedicated support.

AssemblyAI