Deepgram Pricing 2026: Speech-to-Text, Nova-3, Flux and Voice Agent API Costs

Published on: June 10, 2026 by Steven Jones

Deepgram pricing in 2026 is usage-based, with separate rates for speech-to-text, real-time streaming, Flux, Aura text-to-speech, Voice Agent API usage, Audio Intelligence and add-ons such as redaction, diarisation and keyterm prompting. This guide focuses on cost only. For accuracy, features and alternatives, read our Deepgram AI review.

The main point: Deepgram can be a very good value for production speech applications, especially where low latency matters, but the final bill depends heavily on model choice, streaming volume and which add-ons you enable. The headline per-minute rate is only part of the cost model.

Deepgram pricing: quick answer

Deepgram starts with a free $200 credit, then moves to Pay-As-You-Go pricing. Growth starts at $4K+ per year and gives discounted usage rates through prepaid annual credits. Enterprise is quoted for larger volumes, deployments, data, compliance, or support requirements.

Pricing area	Typical 2026 billing unit	What to watch
Speech-to-text	Per audio minute	Nova-3, Flux, streaming mode, multilingual mode and add-ons can change the bill.
Speech-to-text add-ons	Usually per audio minute	Diarisation and redaction are useful, but they add to the base transcription rate.
Aura text-to-speech	Per 1,000 characters	Aura-2 costs more than Aura-1, but is the more current voice model.
Voice Agent API	Per minute of WebSocket connection time	Standard, BYO TTS, BYO LLM and Advanced tiers have different rates.
Audio Intelligence	Per 1,000 input and output tokens	Summaries and analyses are not priced the same way as transcription.

In DIY AI’s 2026 speech-to-text dataset, Deepgram scores 9.1/10 overall, 9.2/10 for cost efficiency, 9.9/10 for speed and 9.9/10 for real-time streaming. That makes it one of the strongest options for voice products where delay affects the user experience.

DIY AI cost value rating: ★★★★☆ 4.6/5

Deepgram pricing plans: Pay As You Go, Growth and Enterprise

Deepgram has three main buying paths. The right one depends less on company size and more on whether your usage is predictable enough to justify prepaid credits.

Plan	Starting point	Best fit	Pricing logic
Free / Pay As You Go	$200 free credit, then usage-based billing	Developers, prototypes, early products and low-volume apps	No annual minimum. You pay based on the products and minutes, characters or tokens used.
Growth	$4K+ per year	Growing applications with recurring usage	Prepaid annual credits are redeemed against actual usage, with discounted rates.
Enterprise	Custom quote	Large volume, private deployment, compliance, support or custom model needs	Sales-led pricing for higher scale and non-standard requirements.

The Growth plan matters once your monthly usage is no longer experimental. A product burning a few hundred dollars per month in transcription, voice agents or TTS can quickly reach the point where annual prepaid credits make sense. Smaller teams should still start with Pay As You Go, because it keeps the billing model honest while they test real audio, concurrency needs and add-on usage.

The public plan limits also matter. Pay As You Go offers useful concurrency for developers, but voice products can hit limits more quickly than transcription tools because each live user requires a real-time connection. Growth raises several WebSocket limits, while Enterprise is the route for custom commitments, private deployment and heavier support needs.

Deepgram speech-to-text pricing by model

Speech-to-text is the core category in Deepgram’s pricing. Deepgram prices current public speech-to-text models per audio minute, with different rates for Pay As You Go and Growth. The public pricing page also shows streaming and pre-recorded views, so always check the live selector before pushing a production estimate into a budget.

The numbers below reflect the currently listed per-minute streaming pricing from Deepgram’s official pricing page. They are suitable for budget modelling, but you should re-check the page before publishing a customer quote or signing a contract.

Model	Pay As You Go	Growth	Approx Pay As You Go per hour	Best fit
Flux English	$0.0065/min	$0.0057/min	$0.39/hr	English real-time voice agents need turn-taking and interruption handling.
Flux Multilingual	$0.0078/min	$0.0068/min	$0.47/hr	Voice agents where users may switch languages in one conversation.
Nova-3 Monolingual	$0.0048/min	$0.0042/min	$0.29/hr	Cost-efficient transcription and streaming where one primary language is expected.
Nova-3 Multilingual	$0.0058/min	$0.0050/min	$0.35/hr	Multilingual transcription, background noise and varied real-world speech.
Custom	Contact sales	Contact sales	Custom	Domain-specific models trained around proprietary or difficult audio.

Nova-3 Monolingual pricing

Nova-3 Monolingual is the cost-efficient default for many English-first or single-language workloads. At the current Pay As You Go streaming rate of $0.0048 per minute, it works out at about $0.29 per audio hour before add-ons.

That is a strong rate for production speech-to-text, but it should not be treated as the whole budget. Speaker diarisation, redaction and keyterm prompting are priced separately. A call transcription workflow that uses Nova-3 Monolingual, plus diarisation and redaction, is not a $ 0.0048-per-minute workflow. It is a stacked-cost workflow.

Nova-3 Multilingual pricing

Nova-3 Multilingual costs more than the monolingual model, but the extra cost is often justified when the audio includes multiple languages, unpredictable language selection, heavy accents, crosstalk or far-field input. On Pay As You Go, the current listed streaming rate is $0.0058 per minute, or about $0.35 per audio hour before add-ons.

This is where Deepgram pricing needs realistic test audio. If your users mostly speak clean English into close microphones, the multilingual model may be unnecessary. If your audio comes from international support calls, public meetings, mobile microphones, or mixed-language conversations, the cheaper model may end up costing more later due to manual correction.

Flux pricing for real-time voice agents

Flux is Deepgram’s speech recognition model built around real-time voice agent use cases. It is not just another transcription model with a faster endpoint. Its pricing should be assessed against agent behaviour: interruption handling, turn detection, latency and whether the transcript arrives fast enough for a natural response loop.

Flux English is listed at $0.0065 per minute on Pay As You Go. Flux Multilingual is listed at $0.0078 per minute. That makes Flux more expensive than Nova-3 Monolingual on a raw per-minute basis, but cheaper isn’t always better for voice agents. A model that handles turns more cleanly can reduce awkward pauses, false interruptions and agent responses that fire at the wrong time.

Deepgram pricing per minute and per hour

Pricing pages usually show per-minute costs because API bills are calculated at that level. Buyers, however, tend to budget in hours. Here is the practical conversion.

Rate per minute	Cost per hour	10,000 minutes	100,000 minutes
$0.0048	$0.288	$48	$480
$0.0058	$0.348	$58	$580
$0.0065	$0.390	$65	$650
$0.0078	$0.468	$78	$780
$0.0750	$4.500	$750	$7,500

The jump in the final row is intentional. Voice Agent API usage is a different product category from raw speech-to-text. It can include orchestration for listening, thinking, and speaking, so the unit price is much higher than that of a basic STT model.

Streaming vs pre-recorded pricing

Streaming and pre-recorded transcription solve different problems. Pre-recorded transcription is used when the audio already exists: podcasts, call recordings, interviews, meetings, videos and uploaded files. Streaming is used when text needs to appear while speech is happening: live captions, voice agents, real-time call assistance, dictation and monitoring.

Do not compare streaming and pre-recorded pricing by cost alone. Streaming requires low-latency infrastructure and usually has stricter concurrency requirements. Pre-recorded transcription can be queued, retried and processed in batches. That makes it easier to control cost, especially for content archives or back-office workflows.

Mode	Best for	Cost behaviour	Buying advice
Pre-recorded transcription	Uploaded files, call recordings, podcasts, videos and archives	Easier to batch and forecast	Use this where turnaround can be measured in seconds or minutes rather than live interaction.
Streaming speech-to-text	Live captions, call assistance, dictation and real-time products	More sensitive to concurrency and connection time	Budget for peak usage, not just total minutes.
Flux streaming	Voice agents and conversational AI	Priced for real-time conversational performance	Use where turn detection and interruption handling matter.

Deepgram free credits: what the starter credit actually covers

The $200 free credit is useful. It is enough to run proper tests with real files, streaming sessions and early voice-agent prototypes. It can also mislead buyers if they only test clean demo audio and forget add-ons.

Usage type	Example rate	What $200 roughly covers
Nova-3 Monolingual	$0.0048/min	About 41,666 minutes, or 694 hours
Nova-3 Multilingual	$0.0058/min	About 34,482 minutes, or 575 hours
Flux English	$0.0065/min	About 30,769 minutes, or 513 hours
Voice Agent API Standard	$0.075/min	About 2,666 minutes, or 44 hours
Aura-2 text-to-speech	$0.030/1,000 characters	About 6.67 million characters

The mistake is assuming the credit reflects your actual monthly bill. It does not. Your production bill depends on the audio mix, add-ons, failed or repeated jobs, concurrent live sessions, agent call length and whether you move onto Growth pricing.

Speech-to-text add-ons: diarisation, redaction, keyterm prompting and formatting

Deepgram’s add-ons are useful because they turn raw transcription into something closer to a usable workflow. They also make cost modelling more complicated.

Add-on	Pay As You Go	Growth	What it does
Redaction	$0.0020/min	$0.0017/min	Removes or masks sensitive information such as payment card numbers, phone numbers and other PII.
Keyterm prompting	$0.0013/min	$0.0012/min	Improves recognition of product names, acronyms, brand terms and specialist vocabulary.
Smart formatting	Included	Included	Formats punctuation, casing, dates, numbers and currency for readability.
Speaker diarisation	$0.0020/min	$0.0017/min	Labels who spoke when in multi-speaker audio.

For call centres, diarisation and redaction are often not optional extras. Speaker labels help QA, analytics and coaching. Redaction helps reduce exposure when transcripts include sensitive information. The pricing issue is that both are per-minute additions, so they scale at exactly the same rate as transcription volume.

For example, Nova-3 Monolingual at $0.0048 per minute increases to $0.0088 per minute when you add diarisation and redaction. That is still not expensive in absolute terms, but it is an 83 per cent increase over the base model line item. This is the most common Deepgram pricing mistake: modelling the model price and forgetting the transcript features buyers actually need.

Deepgram Aura text-to-speech pricing

Deepgram Aura is billed per 1,000 characters, not per minute. This matters because the length of the generated speech depends on the text you send, not on the number of seconds the audio file lasts.

Aura model	Pay As You Go	Growth	Approx cost per 1 million characters
Aura-2	$0.030/1k characters	$0.027/1k characters	$30 Pay As You Go, $27 Growth
Aura-1	$0.0150/1k characters	$0.0135/1k characters	$15 Pay As You Go, $13.50 Growth

Aura-2 is the more expensive model, but it is the one most teams will consider for current voice products. Aura-1 remains cheaper and may be enough for internal tools, simpler narration, testing or cost-sensitive workflows where voice quality is not the primary selling point.

The key budgeting habit is to separate TTS from STT. A voice agent may use speech-to-text to listen and Aura to speak. A support workflow may use speech-to-text for call analytics and Aura for follow-up voice messages. Those are separate meters.

Deepgram Voice Agent API pricing

The Voice Agent API is priced per minute of WebSocket connection time. This is not the same as transcribing a finished recording. A five-minute conversation is billed as a live agent session, and the tier you choose determines whether Deepgram handles more of the stack or whether you bring your own LLM or TTS provider.

Voice Agent API tier	Pay As You Go	Growth	Approx Pay As You Go per hour
Standard	$0.075/min	$0.068/min	$4.50/hr
Standard – BYO TTS	$0.065/min	$0.051/min	$3.90/hr
Custom – BYO LLM	$0.059/min	Confirm in account or with sales	$3.54/hr
Custom – BYO LLM + TTS	$0.050/min	$0.041/min	$3.00/hr
Advanced	$0.163/min	$0.146/min	$9.78/hr
Advanced – BYO TTS	$0.122/min	$0.110/min	$7.32/hr

The Standard tier is easier to model because it keeps more of the workflow inside Deepgram. BYO tiers can reduce the Deepgram bill, but they move part of the cost to another provider. That can be the right decision if your team already has an LLM agreement, a preferred TTS stack or strict control requirements. It can also make finance reporting messier because the real agent cost is split across vendors.

Audio Intelligence pricing

Audio Intelligence is priced differently again. Instead of per audio minute or per character, it is billed per 1,000 input and output tokens. The public pricing groups summarisation, topic detection, sentiment analysis and intent recognition into this Audio Intelligence section.

Audio Intelligence feature	Pay As You Go	Growth	Practical use
Summarisation	$0.0003/1k input tokens and $0.0006/1k output tokens	$0.00024/1k input tokens and $0.00048/1k output tokens	Meeting recaps, call summaries, podcast notes and support handover notes.
Topic detection	$0.0003/1k input tokens and $0.0006/1k output tokens	$0.00024/1k input tokens and $0.00048/1k output tokens	Grouping calls, interviews or recordings by subject.
Sentiment analysis	$0.0003/1k input tokens and $0.0006/1k output tokens	$0.00024/1k input tokens and $0.00048/1k output tokens	Analysing tone and customer mood across conversations.
Intent recognition	$0.0003/1k input tokens and $0.0006/1k output tokens	$0.00024/1k input tokens and $0.00048/1k output tokens	Identifying why the speaker contacted support or what action they wanted.

Audio Intelligence is usually not the first line item buyers notice, but it can become important once transcripts feed dashboards, search, QA or automated follow-up workflows. A team that only needs raw transcripts may not need it. A team building speech analytics probably will.

Example monthly costs by workload

These examples are simplified, but they show how quickly the bill changes once you add realistic features.

Workload	Assumed usage	Example pricing calculation	Estimated monthly cost
Small transcription app	100 hours of Nova-3 Monolingual	6,000 minutes x $0.0048	$28.80
Podcast and interview workflow	100 hours of Nova-3 Monolingual with diarisation	6,000 minutes x ($0.0048 + $0.0020)	$40.80
Support call transcription	1,000 hours of Nova-3 Monolingual with diarisation and redaction	60,000 minutes x ($0.0048 + $0.0020 + $0.0020)	$528
Multilingual support analytics	1,000 hours of Nova-3 Multilingual with diarisation, redaction and keyterm prompting	60,000 minutes x ($0.0058 + $0.0020 + $0.0020 + $0.0013)	$666
Voice agent product	50,000 minutes on Voice Agent API Standard	50,000 minutes x $0.075	$3,750
Aura-2 TTS app	2 million characters	2,000 x $0.030	$60

The lesson is simple: Deepgram’s speech-to-text pricing is competitive, but voice-agent minutes are a different class of spend. If you are building a conversational AI product, do not forecast from Nova-3 STT rates alone. Model the full session cost.

Deepgram vs Whisper, AssemblyAI and Speechmatics on pricing

Deepgram is not always the cheapest option on every line item. Its pricing case is strongest when you need low-latency streaming, scalable speech APIs, voice-agent infrastructure or a combined STT, TTS and agent stack. For a wider tool shortlist, use our speech-to-text tool rankings.

Provider	Pricing angle to compare	Best pricing fit
Deepgram	Usage-based speech APIs, streaming, add-ons and voice-agent billing	Production voice apps and teams that need low-latency STT
OpenAI Whisper	Simple per-minute OpenAI transcription and self-hosting alternatives	Buyers wanting simpler pricing or open-source flexibility
AssemblyAI	Transcription plus speech intelligence	Teams paying for transcripts plus built-in analysis
Speechmatics	Speech-to-text for global accent coverage	International transcription and accent-heavy workloads
Azure AI Speech	Enterprise cloud pricing and Azure-native deployment	Microsoft-first procurement and enterprise governance

For pure transcription, OpenAI’s current pricing is simpler to understand. GPT-4o-mini-transcribe is priced at around $0.003 per minute, while GPT-4o-transcribe is priced at around $0.006 per minute. Deepgram can still be attractive because its real-time streaming and Flux positioning are stronger for voice products. For more details, see our OpenAI Whisper API pricing guide.

AssemblyAI is often a better comparison when you are paying for transcripts plus speech understanding. Speechmatics offers pricing comparisons for international and accent-heavy workloads. Azure AI Speech is less about the lowest visible per-minute rate and more about enterprise procurement, Azure governance and Microsoft-native deployment.

Deepgram cost pros and cons

Pricing pros	Pricing cons
Strong per-minute rates for current speech-to-text models.	Add-ons can materially increase the real cost of a transcript.
$200 free credit is generous enough for meaningful testing.	Voice Agent API pricing is much higher than raw STT pricing.
Growth discounts help once usage becomes predictable.	Multiple billing units make forecasting harder across STT, TTS, agents and intelligence.
Flux gives a clear pricing path for real-time voice-agent speech recognition.	BYO tiers can reduce the Deepgram line item while shifting cost to another provider.
Smart formatting is included rather than charged as an extra.	Concurrency limits matter for live products and can push teams towards Growth or Enterprise.

Hidden pricing mistakes to avoid

Budgeting from the base STT rate only

Base transcription rates are easy to understand. Real workflows are not always base workflows. If you need diarisation, redaction or keyterm prompting, include those in the first estimate rather than adding them after launch.

Ignoring connection time for voice agents

The Voice Agent API is billed by connection time. Long pauses, abandoned calls, testing loops and poor session handling can all affect cost. Engineering decisions matter here. Closing sessions cleanly is a pricing control, not just a technical detail.

Testing with clean audio only

Clean audio can make every provider look cheaper than they really are. Test with actual calls, poor microphones, interruptions, accents, background noise and real speaker behaviour. Otherwise, you may choose a cheaper model that will require more correction later.

Forgetting concurrency

Monthly minutes are not the same as peak live demand. A product with modest total usage can still need higher concurrency if many users arrive at once. This is especially important for webinars, call centres, live captions and voice agents.

Treating BYO as automatically cheaper

Bring-your-own LLM or TTS pricing can reduce Deepgram’s bill, but the external provider still charges you. It can be cheaper, but only after you include the other invoice, engineering effort, monitoring and failure handling.

Deepgram pricing FAQs

How much does Deepgram cost in 2026?

Deepgram starts with a free $200 credit, then uses Pay-As-You-Go pricing. Growth starts at $4K+ per year, and Enterprise is custom quoted. Product costs depend on whether you use speech-to-text, Flux, Aura text-to-speech, Voice Agent API, Audio Intelligence or add-ons.

How much does Deepgram speech-to-text cost per minute?

Current public streaming rates include Nova-3 Monolingual at $0.0048 per minute on Pay As You Go, Nova-3 Multilingual at $0.0058 per minute, Flux English at $0.0065 per minute, and Flux Multilingual at $0.0078 per minute. Growth rates are lower.

Does Deepgram offer free credits?

Yes. Deepgram offers $200 in free credit to get started. That credit is useful for testing real audio, streaming, add-ons, TTS and early voice-agent sessions before committing to recurring spend.

What is the difference between Pay As You Go and Growth?

Pay As You Go is usage-based with no annual minimum. Growth starts at $4K+ per year and uses prepaid annual credits that are redeemed against actual usage. Growth also gives discounted unit rates and higher limits in some areas.

How much does Nova-3 cost?

Nova-3 Monolingual is listed from $0.0048 per minute on Pay As You Go and $0.0042 per minute on Growth in the current streaming pricing view. Nova-3 Multilingual is listed at $0.0058 per minute on Pay As You Go and $0.0050 per minute on Growth.

How much does Flux cost?

Flux English is listed at $0.0065 per minute on Pay As You Go and $0.0057 per minute on Growth. Flux Multilingual is listed at $0.0078 per minute on Pay As You Go and $0.0068 per minute on Growth.

Is streaming more expensive than pre-recorded transcription?

Often, yes, because streaming requires real-time infrastructure and live concurrency. Deepgram’s public page also shows promotional streaming pricing, so check the live pricing view before assuming a permanent gap between streaming and pre-recorded rates.

Does Deepgram charge extra for diarisation?

Yes. Speaker diarisation is listed as an add-on at $0.0020 per minute on Pay As You Go and $0.0017 per minute on Growth. It is useful for calls, meetings, interviews and multi-speaker transcripts.

Does Deepgram charge extra for redaction?

Yes. Redaction is listed at $0.0020 per minute on Pay As You Go and $0.0017 per minute on Growth. It is commonly used when transcripts may include phone numbers, payment details or other sensitive information.

How much does Deepgram Aura text-to-speech cost?

Aura-2 is listed at $0.030 per 1,000 characters on Pay As You Go and $0.027 per 1,000 characters on Growth. Aura-1 is listed at $0.0150 per 1,000 characters on Pay As You Go and $0.0135 per 1,000 characters on Growth.

How does Deepgram Voice Agent API pricing work?

The Voice Agent API is billed per minute of WebSocket connection time. Standard is listed at $0.075 per minute on Pay As You Go, while BYO and Advanced tiers change the rate depending on whether Deepgram or another provider handles parts of the stack.

Is Deepgram cheaper than Whisper?

It depends on the use case. OpenAI’s lower-cost transcription options can be cheaper for straightforward file transcription. Deepgram becomes more compelling when you need low-latency streaming, Flux, voice agents, real-time behaviour or a speech API designed around production voice apps.

Is Deepgram pricing good value for call centres?

It can be, especially where low latency, diarisation, redaction and analytics matter. The important step is to model the complete cost per recorded or live call, including add-ons. Call centres should also check concurrency and support needs before relying on Pay As You Go assumptions.

Can Deepgram pricing get expensive at scale?

Yes. Any usage-based API can become expensive at scale if volume grows, sessions remain open too long, add-ons are enabled by default, or voice-agent minutes are forecast as basic transcription minutes. Growth or Enterprise pricing may become sensible once usage is predictable.

Verdict: Is Deepgram good value?

Deepgram is good value when the buying decision is about real-time speech infrastructure rather than basic transcription alone. Its strongest pricing fit is production speech-to-text, live transcription, voice agents, call workflows and products that need fast responses from messy audio.

For simple file transcription, carefully compare Deepgram against Whisper-style pricing. For speech intelligence, compare it with AssemblyAI. For international accent coverage, compare it with Speechmatics. For Microsoft-first procurement, compare it with Azure AI Speech.

The best way to approach Deepgram pricing is to model the actual workflow: minutes, model, mode, add-ons, TTS characters, agent connection time, Audio Intelligence tokens and expected concurrency. Do that, and Deepgram’s pricing is transparent enough to budget. Skip that step, and the invoice can look very different from the headline model rate.

Best AI Speech To Text Tools 2026

By: Steven Jones On: December 2, 2025

Updated on: May 17, 2026

The best AI speech-to-text tools in 2026 are no longer judged by word error rate alone. Accuracy still matters, but…

OpenAI Whisper API Pricing

By: Steven Jones On: February 15, 2026

Updated on: June 8, 2026

OpenAI Whisper API pricing in 2026 is no longer a single "$0.006 per minute" answer. That rate still matters for…

OpenAI Whisper Review 2026

By: Steven Jones On: February 13, 2026

Updated on: May 22, 2026

OpenAI Whisper remains one of the most important speech-to-text systems in 2026, especially for teams that want high accuracy, open-source…

Writer: Steven Jones

AI Tools Reviewer and Technical Analyst

Steven Jones is a technology analyst specialising in artificial intelligence, machine learning workflows, and emerging automation tools. At DIY AI, he focuses on clear, practical guidance for people comparing AI tools in the real world. His work covers text generation, image generation, video tools, data platforms, developer-focused AI products, and the automation workflows that connect them. Steven's reviews are built around hands-on testing, practical benchmarks, and transparent scoring rather than vendor claims. He looks closely at where each tool performs well, where it falls short, and what those trade-offs mean for creators, teams, and businesses trying to make sensible AI adoption decisions. He has a particular interest in safety, reliability, output quality, performance metrics, and dataset quality. When he is not reviewing the latest AI model updates, he experiments with prompt engineering techniques and contributes to DIY AI ongoing work on fair, explainable scoring frameworks for AI tools.

Contact