Deepgram Pricing 2026: Speech-to-Text, Nova-3, Flux and Voice Agent API Costs
Deepgram pricing in 2026 is usage-based, with separate rates for speech-to-text, real-time streaming, Flux, Aura text-to-speech, Voice Agent API usage, Audio Intelligence and add-ons such as redaction, diarisation and keyterm prompting. This guide focuses on cost only. For accuracy, features and alternatives, read our Deepgram AI review.
The main point: Deepgram can be a very good value for production speech applications, especially where low latency matters, but the final bill depends heavily on model choice, streaming volume and which add-ons you enable. The headline per-minute rate is only part of the cost model.
Deepgram pricing: quick answer
Deepgram starts with a free $200 credit, then moves to Pay-As-You-Go pricing. Growth starts at $4K+ per year and gives discounted usage rates through prepaid annual credits. Enterprise is quoted for larger volumes, deployments, data, compliance, or support requirements.
| Pricing area | Typical 2026 billing unit | What to watch |
|---|---|---|
| Speech-to-text | Per audio minute | Nova-3, Flux, streaming mode, multilingual mode and add-ons can change the bill. |
| Speech-to-text add-ons | Usually per audio minute | Diarisation and redaction are useful, but they add to the base transcription rate. |
| Aura text-to-speech | Per 1,000 characters | Aura-2 costs more than Aura-1, but is the more current voice model. |
| Voice Agent API | Per minute of WebSocket connection time | Standard, BYO TTS, BYO LLM and Advanced tiers have different rates. |
| Audio Intelligence | Per 1,000 input and output tokens | Summaries and analyses are not priced the same way as transcription. |
In DIY AI’s 2026 speech-to-text dataset, Deepgram scores 9.1/10 overall, 9.2/10 for cost efficiency, 9.9/10 for speed and 9.9/10 for real-time streaming. That makes it one of the strongest options for voice products where delay affects the user experience.
DIY AI cost value rating: ★★★★☆ 4.6/5
Deepgram pricing plans: Pay As You Go, Growth and Enterprise
Deepgram has three main buying paths. The right one depends less on company size and more on whether your usage is predictable enough to justify prepaid credits.
| Plan | Starting point | Best fit | Pricing logic |
|---|---|---|---|
| Free / Pay As You Go | $200 free credit, then usage-based billing | Developers, prototypes, early products and low-volume apps | No annual minimum. You pay based on the products and minutes, characters or tokens used. |
| Growth | $4K+ per year | Growing applications with recurring usage | Prepaid annual credits are redeemed against actual usage, with discounted rates. |
| Enterprise | Custom quote | Large volume, private deployment, compliance, support or custom model needs | Sales-led pricing for higher scale and non-standard requirements. |
The Growth plan matters once your monthly usage is no longer experimental. A product burning a few hundred dollars per month in transcription, voice agents or TTS can quickly reach the point where annual prepaid credits make sense. Smaller teams should still start with Pay As You Go, because it keeps the billing model honest while they test real audio, concurrency needs and add-on usage.
The public plan limits also matter. Pay As You Go offers useful concurrency for developers, but voice products can hit limits more quickly than transcription tools because each live user requires a real-time connection. Growth raises several WebSocket limits, while Enterprise is the route for custom commitments, private deployment and heavier support needs.
Deepgram speech-to-text pricing by model
Speech-to-text is the core category in Deepgram’s pricing. Deepgram prices current public speech-to-text models per audio minute, with different rates for Pay As You Go and Growth. The public pricing page also shows streaming and pre-recorded views, so always check the live selector before pushing a production estimate into a budget.
The numbers below reflect the currently listed per-minute streaming pricing from Deepgram’s official pricing page. They are suitable for budget modelling, but you should re-check the page before publishing a customer quote or signing a contract.
| Model | Pay As You Go | Growth | Approx Pay As You Go per hour | Best fit |
|---|---|---|---|---|
| Flux English | $0.0065/min | $0.0057/min | $0.39/hr | English real-time voice agents need turn-taking and interruption handling. |
| Flux Multilingual | $0.0078/min | $0.0068/min | $0.47/hr | Voice agents where users may switch languages in one conversation. |
| Nova-3 Monolingual | $0.0048/min | $0.0042/min | $0.29/hr | Cost-efficient transcription and streaming where one primary language is expected. |
| Nova-3 Multilingual | $0.0058/min | $0.0050/min | $0.35/hr | Multilingual transcription, background noise and varied real-world speech. |
| Custom | Contact sales | Contact sales | Custom | Domain-specific models trained around proprietary or difficult audio. |
Nova-3 Monolingual pricing
Nova-3 Monolingual is the cost-efficient default for many English-first or single-language workloads. At the current Pay As You Go streaming rate of $0.0048 per minute, it works out at about $0.29 per audio hour before add-ons.
That is a strong rate for production speech-to-text, but it should not be treated as the whole budget. Speaker diarisation, redaction and keyterm prompting are priced separately. A call transcription workflow that uses Nova-3 Monolingual, plus diarisation and redaction, is not a $ 0.0048-per-minute workflow. It is a stacked-cost workflow.
Nova-3 Multilingual pricing
Nova-3 Multilingual costs more than the monolingual model, but the extra cost is often justified when the audio includes multiple languages, unpredictable language selection, heavy accents, crosstalk or far-field input. On Pay As You Go, the current listed streaming rate is $0.0058 per minute, or about $0.35 per audio hour before add-ons.
This is where Deepgram pricing needs realistic test audio. If your users mostly speak clean English into close microphones, the multilingual model may be unnecessary. If your audio comes from international support calls, public meetings, mobile microphones, or mixed-language conversations, the cheaper model may end up costing more later due to manual correction.
Flux pricing for real-time voice agents
Flux is Deepgram’s speech recognition model built around real-time voice agent use cases. It is not just another transcription model with a faster endpoint. Its pricing should be assessed against agent behaviour: interruption handling, turn detection, latency and whether the transcript arrives fast enough for a natural response loop.
Flux English is listed at $0.0065 per minute on Pay As You Go. Flux Multilingual is listed at $0.0078 per minute. That makes Flux more expensive than Nova-3 Monolingual on a raw per-minute basis, but cheaper isn’t always better for voice agents. A model that handles turns more cleanly can reduce awkward pauses, false interruptions and agent responses that fire at the wrong time.
Deepgram pricing per minute and per hour
Pricing pages usually show per-minute costs because API bills are calculated at that level. Buyers, however, tend to budget in hours. Here is the practical conversion.
| Rate per minute | Cost per hour | 10,000 minutes | 100,000 minutes |
|---|---|---|---|
| $0.0048 | $0.288 | $48 | $480 |
| $0.0058 | $0.348 | $58 | $580 |
| $0.0065 | $0.390 | $65 | $650 |
| $0.0078 | $0.468 | $78 | $780 |
| $0.0750 | $4.500 | $750 | $7,500 |
The jump in the final row is intentional. Voice Agent API usage is a different product category from raw speech-to-text. It can include orchestration for listening, thinking, and speaking, so the unit price is much higher than that of a basic STT model.
Streaming vs pre-recorded pricing
Streaming and pre-recorded transcription solve different problems. Pre-recorded transcription is used when the audio already exists: podcasts, call recordings, interviews, meetings, videos and uploaded files. Streaming is used when text needs to appear while speech is happening: live captions, voice agents, real-time call assistance, dictation and monitoring.
Do not compare streaming and pre-recorded pricing by cost alone. Streaming requires low-latency infrastructure and usually has stricter concurrency requirements. Pre-recorded transcription can be queued, retried and processed in batches. That makes it easier to control cost, especially for content archives or back-office workflows.
| Mode | Best for | Cost behaviour | Buying advice |
|---|---|---|---|
| Pre-recorded transcription | Uploaded files, call recordings, podcasts, videos and archives | Easier to batch and forecast | Use this where turnaround can be measured in seconds or minutes rather than live interaction. |
| Streaming speech-to-text | Live captions, call assistance, dictation and real-time products | More sensitive to concurrency and connection time | Budget for peak usage, not just total minutes. |
| Flux streaming | Voice agents and conversational AI | Priced for real-time conversational performance | Use where turn detection and interruption handling matter. |
Deepgram free credits: what the starter credit actually covers
The $200 free credit is useful. It is enough to run proper tests with real files, streaming sessions and early voice-agent prototypes. It can also mislead buyers if they only test clean demo audio and forget add-ons.
| Usage type | Example rate | What $200 roughly covers |
|---|---|---|
| Nova-3 Monolingual | $0.0048/min | About 41,666 minutes, or 694 hours |
| Nova-3 Multilingual | $0.0058/min | About 34,482 minutes, or 575 hours |
| Flux English | $0.0065/min | About 30,769 minutes, or 513 hours |
| Voice Agent API Standard | $0.075/min | About 2,666 minutes, or 44 hours |
| Aura-2 text-to-speech | $0.030/1,000 characters | About 6.67 million characters |
The mistake is assuming the credit reflects your actual monthly bill. It does not. Your production bill depends on the audio mix, add-ons, failed or repeated jobs, concurrent live sessions, agent call length and whether you move onto Growth pricing.
Speech-to-text add-ons: diarisation, redaction, keyterm prompting and formatting
Deepgram’s add-ons are useful because they turn raw transcription into something closer to a usable workflow. They also make cost modelling more complicated.
| Add-on | Pay As You Go | Growth | What it does |
|---|---|---|---|
| Redaction | $0.0020/min | $0.0017/min | Removes or masks sensitive information such as payment card numbers, phone numbers and other PII. |
| Keyterm prompting | $0.0013/min | $0.0012/min | Improves recognition of product names, acronyms, brand terms and specialist vocabulary. |
| Smart formatting | Included | Included | Formats punctuation, casing, dates, numbers and currency for readability. |
| Speaker diarisation | $0.0020/min | $0.0017/min | Labels who spoke when in multi-speaker audio. |
For call centres, diarisation and redaction are often not optional extras. Speaker labels help QA, analytics and coaching. Redaction helps reduce exposure when transcripts include sensitive information. The pricing issue is that both are per-minute additions, so they scale at exactly the same rate as transcription volume.
For example, Nova-3 Monolingual at $0.0048 per minute increases to $0.0088 per minute when you add diarisation and redaction. That is still not expensive in absolute terms, but it is an 83 per cent increase over the base model line item. This is the most common Deepgram pricing mistake: modelling the model price and forgetting the transcript features buyers actually need.
Deepgram Aura text-to-speech pricing
Deepgram Aura is billed per 1,000 characters, not per minute. This matters because the length of the generated speech depends on the text you send, not on the number of seconds the audio file lasts.
| Aura model | Pay As You Go | Growth | Approx cost per 1 million characters |
|---|---|---|---|
| Aura-2 | $0.030/1k characters | $0.027/1k characters | $30 Pay As You Go, $27 Growth |
| Aura-1 | $0.0150/1k characters | $0.0135/1k characters | $15 Pay As You Go, $13.50 Growth |
Aura-2 is the more expensive model, but it is the one most teams will consider for current voice products. Aura-1 remains cheaper and may be enough for internal tools, simpler narration, testing or cost-sensitive workflows where voice quality is not the primary selling point.
The key budgeting habit is to separate TTS from STT. A voice agent may use speech-to-text to listen and Aura to speak. A support workflow may use speech-to-text for call analytics and Aura for follow-up voice messages. Those are separate meters.
Deepgram Voice Agent API pricing
The Voice Agent API is priced per minute of WebSocket connection time. This is not the same as transcribing a finished recording. A five-minute conversation is billed as a live agent session, and the tier you choose determines whether Deepgram handles more of the stack or whether you bring your own LLM or TTS provider.
| Voice Agent API tier | Pay As You Go | Growth | Approx Pay As You Go per hour |
|---|---|---|---|
| Standard | $0.075/min | $0.068/min | $4.50/hr |
| Standard – BYO TTS | $0.065/min | $0.051/min | $3.90/hr |
| Custom – BYO LLM | $0.059/min | Confirm in account or with sales | $3.54/hr |
| Custom – BYO LLM + TTS | $0.050/min | $0.041/min | $3.00/hr |
| Advanced | $0.163/min | $0.146/min | $9.78/hr |
| Advanced – BYO TTS | $0.122/min | $0.110/min | $7.32/hr |
The Standard tier is easier to model because it keeps more of the workflow inside Deepgram. BYO tiers can reduce the Deepgram bill, but they move part of the cost to another provider. That can be the right decision if your team already has an LLM agreement, a preferred TTS stack or strict control requirements. It can also make finance reporting messier because the real agent cost is split across vendors.
Audio Intelligence pricing
Audio Intelligence is priced differently again. Instead of per audio minute or per character, it is billed per 1,000 input and output tokens. The public pricing groups summarisation, topic detection, sentiment analysis and intent recognition into this Audio Intelligence section.
| Audio Intelligence feature | Pay As You Go | Growth | Practical use |
|---|---|---|---|
| Summarisation | $0.0003/1k input tokens and $0.0006/1k output tokens | $0.00024/1k input tokens and $0.00048/1k output tokens | Meeting recaps, call summaries, podcast notes and support handover notes. |
| Topic detection | $0.0003/1k input tokens and $0.0006/1k output tokens | $0.00024/1k input tokens and $0.00048/1k output tokens | Grouping calls, interviews or recordings by subject. |
| Sentiment analysis | $0.0003/1k input tokens and $0.0006/1k output tokens | $0.00024/1k input tokens and $0.00048/1k output tokens | Analysing tone and customer mood across conversations. |
| Intent recognition | $0.0003/1k input tokens and $0.0006/1k output tokens | $0.00024/1k input tokens and $0.00048/1k output tokens | Identifying why the speaker contacted support or what action they wanted. |
Audio Intelligence is usually not the first line item buyers notice, but it can become important once transcripts feed dashboards, search, QA or automated follow-up workflows. A team that only needs raw transcripts may not need it. A team building speech analytics probably will.
Example monthly costs by workload
These examples are simplified, but they show how quickly the bill changes once you add realistic features.
| Workload | Assumed usage | Example pricing calculation | Estimated monthly cost |
|---|---|---|---|
| Small transcription app | 100 hours of Nova-3 Monolingual | 6,000 minutes x $0.0048 | $28.80 |
| Podcast and interview workflow | 100 hours of Nova-3 Monolingual with diarisation | 6,000 minutes x ($0.0048 + $0.0020) | $40.80 |
| Support call transcription | 1,000 hours of Nova-3 Monolingual with diarisation and redaction | 60,000 minutes x ($0.0048 + $0.0020 + $0.0020) | $528 |
| Multilingual support analytics | 1,000 hours of Nova-3 Multilingual with diarisation, redaction and keyterm prompting | 60,000 minutes x ($0.0058 + $0.0020 + $0.0020 + $0.0013) | $666 |
| Voice agent product | 50,000 minutes on Voice Agent API Standard | 50,000 minutes x $0.075 | $3,750 |
| Aura-2 TTS app | 2 million characters | 2,000 x $0.030 | $60 |
The lesson is simple: Deepgram’s speech-to-text pricing is competitive, but voice-agent minutes are a different class of spend. If you are building a conversational AI product, do not forecast from Nova-3 STT rates alone. Model the full session cost.
Deepgram vs Whisper, AssemblyAI and Speechmatics on pricing
Deepgram is not always the cheapest option on every line item. Its pricing case is strongest when you need low-latency streaming, scalable speech APIs, voice-agent infrastructure or a combined STT, TTS and agent stack. For a wider tool shortlist, use our speech-to-text tool rankings.
| Provider | Pricing angle to compare | Best pricing fit |
|---|---|---|
| Deepgram | Usage-based speech APIs, streaming, add-ons and voice-agent billing | Production voice apps and teams that need low-latency STT |
| OpenAI Whisper | Simple per-minute OpenAI transcription and self-hosting alternatives | Buyers wanting simpler pricing or open-source flexibility |
| AssemblyAI | Transcription plus speech intelligence | Teams paying for transcripts plus built-in analysis |
| Speechmatics | Speech-to-text for global accent coverage | International transcription and accent-heavy workloads |
| Azure AI Speech | Enterprise cloud pricing and Azure-native deployment | Microsoft-first procurement and enterprise governance |
For pure transcription, OpenAI’s current pricing is simpler to understand. GPT-4o-mini-transcribe is priced at around $0.003 per minute, while GPT-4o-transcribe is priced at around $0.006 per minute. Deepgram can still be attractive because its real-time streaming and Flux positioning are stronger for voice products. For more details, see our OpenAI Whisper API pricing guide.
AssemblyAI is often a better comparison when you are paying for transcripts plus speech understanding. Speechmatics offers pricing comparisons for international and accent-heavy workloads. Azure AI Speech is less about the lowest visible per-minute rate and more about enterprise procurement, Azure governance and Microsoft-native deployment.
Deepgram cost pros and cons
| Pricing pros | Pricing cons |
|---|---|
| Strong per-minute rates for current speech-to-text models. | Add-ons can materially increase the real cost of a transcript. |
| $200 free credit is generous enough for meaningful testing. | Voice Agent API pricing is much higher than raw STT pricing. |
| Growth discounts help once usage becomes predictable. | Multiple billing units make forecasting harder across STT, TTS, agents and intelligence. |
| Flux gives a clear pricing path for real-time voice-agent speech recognition. | BYO tiers can reduce the Deepgram line item while shifting cost to another provider. |
| Smart formatting is included rather than charged as an extra. | Concurrency limits matter for live products and can push teams towards Growth or Enterprise. |
Hidden pricing mistakes to avoid
Budgeting from the base STT rate only
Base transcription rates are easy to understand. Real workflows are not always base workflows. If you need diarisation, redaction or keyterm prompting, include those in the first estimate rather than adding them after launch.
Ignoring connection time for voice agents
The Voice Agent API is billed by connection time. Long pauses, abandoned calls, testing loops and poor session handling can all affect cost. Engineering decisions matter here. Closing sessions cleanly is a pricing control, not just a technical detail.
Testing with clean audio only
Clean audio can make every provider look cheaper than they really are. Test with actual calls, poor microphones, interruptions, accents, background noise and real speaker behaviour. Otherwise, you may choose a cheaper model that will require more correction later.
Forgetting concurrency
Monthly minutes are not the same as peak live demand. A product with modest total usage can still need higher concurrency if many users arrive at once. This is especially important for webinars, call centres, live captions and voice agents.
Treating BYO as automatically cheaper
Bring-your-own LLM or TTS pricing can reduce Deepgram’s bill, but the external provider still charges you. It can be cheaper, but only after you include the other invoice, engineering effort, monitoring and failure handling.
Deepgram pricing FAQs
How much does Deepgram cost in 2026?
Deepgram starts with a free $200 credit, then uses Pay-As-You-Go pricing. Growth starts at $4K+ per year, and Enterprise is custom quoted. Product costs depend on whether you use speech-to-text, Flux, Aura text-to-speech, Voice Agent API, Audio Intelligence or add-ons.
How much does Deepgram speech-to-text cost per minute?
Current public streaming rates include Nova-3 Monolingual at $0.0048 per minute on Pay As You Go, Nova-3 Multilingual at $0.0058 per minute, Flux English at $0.0065 per minute, and Flux Multilingual at $0.0078 per minute. Growth rates are lower.
Does Deepgram offer free credits?
Yes. Deepgram offers $200 in free credit to get started. That credit is useful for testing real audio, streaming, add-ons, TTS and early voice-agent sessions before committing to recurring spend.
What is the difference between Pay As You Go and Growth?
Pay As You Go is usage-based with no annual minimum. Growth starts at $4K+ per year and uses prepaid annual credits that are redeemed against actual usage. Growth also gives discounted unit rates and higher limits in some areas.
How much does Nova-3 cost?
Nova-3 Monolingual is listed from $0.0048 per minute on Pay As You Go and $0.0042 per minute on Growth in the current streaming pricing view. Nova-3 Multilingual is listed at $0.0058 per minute on Pay As You Go and $0.0050 per minute on Growth.
How much does Flux cost?
Flux English is listed at $0.0065 per minute on Pay As You Go and $0.0057 per minute on Growth. Flux Multilingual is listed at $0.0078 per minute on Pay As You Go and $0.0068 per minute on Growth.
Is streaming more expensive than pre-recorded transcription?
Often, yes, because streaming requires real-time infrastructure and live concurrency. Deepgram’s public page also shows promotional streaming pricing, so check the live pricing view before assuming a permanent gap between streaming and pre-recorded rates.
Does Deepgram charge extra for diarisation?
Yes. Speaker diarisation is listed as an add-on at $0.0020 per minute on Pay As You Go and $0.0017 per minute on Growth. It is useful for calls, meetings, interviews and multi-speaker transcripts.
Does Deepgram charge extra for redaction?
Yes. Redaction is listed at $0.0020 per minute on Pay As You Go and $0.0017 per minute on Growth. It is commonly used when transcripts may include phone numbers, payment details or other sensitive information.
How much does Deepgram Aura text-to-speech cost?
Aura-2 is listed at $0.030 per 1,000 characters on Pay As You Go and $0.027 per 1,000 characters on Growth. Aura-1 is listed at $0.0150 per 1,000 characters on Pay As You Go and $0.0135 per 1,000 characters on Growth.
How does Deepgram Voice Agent API pricing work?
The Voice Agent API is billed per minute of WebSocket connection time. Standard is listed at $0.075 per minute on Pay As You Go, while BYO and Advanced tiers change the rate depending on whether Deepgram or another provider handles parts of the stack.
Is Deepgram cheaper than Whisper?
It depends on the use case. OpenAI’s lower-cost transcription options can be cheaper for straightforward file transcription. Deepgram becomes more compelling when you need low-latency streaming, Flux, voice agents, real-time behaviour or a speech API designed around production voice apps.
Is Deepgram pricing good value for call centres?
It can be, especially where low latency, diarisation, redaction and analytics matter. The important step is to model the complete cost per recorded or live call, including add-ons. Call centres should also check concurrency and support needs before relying on Pay As You Go assumptions.
Can Deepgram pricing get expensive at scale?
Yes. Any usage-based API can become expensive at scale if volume grows, sessions remain open too long, add-ons are enabled by default, or voice-agent minutes are forecast as basic transcription minutes. Growth or Enterprise pricing may become sensible once usage is predictable.
Verdict: Is Deepgram good value?
Deepgram is good value when the buying decision is about real-time speech infrastructure rather than basic transcription alone. Its strongest pricing fit is production speech-to-text, live transcription, voice agents, call workflows and products that need fast responses from messy audio.
For simple file transcription, carefully compare Deepgram against Whisper-style pricing. For speech intelligence, compare it with AssemblyAI. For international accent coverage, compare it with Speechmatics. For Microsoft-first procurement, compare it with Azure AI Speech.
The best way to approach Deepgram pricing is to model the actual workflow: minutes, model, mode, add-ons, TTS characters, agent connection time, Audio Intelligence tokens and expected concurrency. Do that, and Deepgram’s pricing is transparent enough to budget. Skip that step, and the invoice can look very different from the headline model rate.

