OpenAI Whisper API Pricing 2026: Cost Per Minute, GPT-4o Transcribe and Cheaper Alternatives

Whisper API Pricing 2026

OpenAI Whisper API pricing in 2026 is no longer a single “$0.006 per minute” answer. That rate still matters for whisper-1 and gpt-4o-transcribe, but OpenAI now also lists gpt-4o-mini-transcribe at about $0.003 per minute and gpt-realtime-whisper at about $0.017 per minute for live transcription.

That changes the buying decision. A podcast archive, meeting recorder, subtitle workflow, research library or SaaS transcription feature should not be priced from the headline rate alone. The real cost depends on batch versus realtime transcription, model choice, file duration, silence trimming, speaker diarisation, retries, transcript cleanup and any downstream summarisation.

This updated guide gives the price answer first, then shows how to model monthly costs, when to use each OpenAI transcription route, how OpenAI compares with Deepgram, AssemblyAI, Google and AWS, and where cheaper options make sense. The provider scores use DIY AI’s 2026 speech-to-text dataset, the same framework used in our best AI speech-to-text tools guide.

OpenAI Whisper API pricing: the quick answer

As of June 2026, the practical OpenAI transcription pricing range is about $0.003 to $0.006 per minute for standard file transcription, and about $0.017 per minute for gpt-realtime-whisper when you need live transcript deltas. Always verify the live rate on the official OpenAI API pricing page before putting numbers into a client quote, procurement document or production margin model.

OpenAI transcription optionCurrent price to modelEstimated hourly costBest forMain trade-off
whisper-1$0.006/minute$0.36/hourExisting Whisper integrations and simple file transcriptionOlder model path, not the lowest-cost OpenAI option
gpt-4o-mini-transcribe$0.003/minute$0.18/hourLowest-cost managed OpenAI transcription for clean batch audioNeeds testing on noisy calls, accents and technical vocabulary
gpt-4o-transcribe$0.006/minute$0.36/hourHigher-quality managed transcription inside the OpenAI APITwice the base cost of the mini transcription route
gpt-realtime-whisper$0.017/minute$1.02/hourLive speech-to-text, captions and transcript deltasMuch more expensive than batch transcription
Self-hosted WhisperInfrastructure-dependentInfrastructure-dependentHigh-volume teams with GPU capacity and engineering supportYou own scaling, queues, monitoring, failure recovery and model updates

The important point is not just that mini is cheaper. It is that OpenAI now splits the decision into cheap batch transcription, higher-quality batch transcription, legacy Whisper compatibility and realtime transcription. Those are different workflows, not four names for the same job.



OpenAI Whisper cost calculator

OpenAI API ($0.006)
$6.00
Deepgram Nova-2 ($0.0043)
$4.30
Self-Hosted (DIY)
$0.50
(Est. GPU costs)

*Self-hosted estimates based on $0.03/hr GPU processing speed.

Monthly transcription cost = uploaded audio minutes x model price per minute.

Audio volumeCost at $0.003/minuteCost at $0.006/minuteCost at $0.017/minutePractical meaning
100 minutes$0.30$0.60$1.70Small tests, demos and occasional uploads
500 minutes$1.50$3.00$8.50Small podcast archive or research notes
3,000 minutes$9.00$18.00$51.00Regular calls, interviews or internal meeting transcription
30,000 minutes$90.00$180.00$510.00SaaS feature or serious content workflow
300,000 minutes$900.00$1,800.00$5,100.00Enterprise-scale usage where routing and vendor choice matter

For a one-hour file, the quick mental maths is easy: $0.18 on gpt-4o-mini-transcribe, $0.36 on whisper-1 or gpt-4o-transcribe, and $1.02 if you model the same hour as gpt-realtime-whisper. That does not include storage, audio conversion, diarisation, retries, human review or later AI analysis.

Whisper-1 vs GPT-4o Transcribe vs GPT-Realtime-Whisper

A lot of people still search for “OpenAI Whisper pricing” because Whisper made OpenAI transcription popular. The API decision in 2026 is broader.

Model or routeRole in 2026Use it whenAvoid it when
whisper-1Legacy OpenAI Whisper model for speech recognitionYou already have an integration built around whisper-1 and it performs well enoughYou are building a new workflow and want the cheapest managed OpenAI option first
gpt-4o-mini-transcribeLower-cost OpenAI batch transcription modelAudio is clean, volume is high and small accuracy differences are acceptableAudio is noisy, names and numbers matter, or you have costly correction workflows
gpt-4o-transcribeHigher-quality managed OpenAI transcription routeAccuracy matters more than halving the base transcription billThe audio is simple enough that mini passes your own tests
gpt-realtime-whisperRealtime speech-to-text model for live transcript deltasThe user needs to see words appear during the sessionYou are uploading completed files and can wait for a batch transcript
Self-hosted WhisperOpen-source deployment route outside the managed OpenAI APIYou have steady volume, GPU capacity and people who can maintain the systemYou want low admin overhead or your transcription load is unpredictable

For a new build, start by testing gpt-4o-mini-transcribe and gpt-4o-transcribe against a small but representative audio set. Include good microphones, bad microphones, background noise, overlapping speakers, names, numbers, accents, long files and any specialist terms your users care about. Synthetic demo audio will not reveal where a production transcript fails.

Is Whisper-1 still available in 2026?

Yes. whisper-1 still appears in OpenAI’s model documentation as a general-purpose speech recognition model, and it is still relevant for existing Whisper integrations. The mistake is assuming whisper-1 is the only OpenAI transcription choice.

For existing systems, staying on whisper-1 can be sensible if the cost is predictable and output quality is known. For new systems, compare it against gpt-4o-mini-transcribe and gpt-4o-transcribe before committing. The lower-cost model may be good enough for clean bulk audio, while the higher-quality model may save money indirectly if it reduces manual correction.

OpenAI transcription pricing by monthly usage

Pricing becomes easier to reason about when you map it to real workloads instead of abstract minutes. These examples use base transcription costs only. They do not include downstream summarisation, storage, retries, diarisation, redaction or human review.

ScenarioMinutes per monthBest starting modelEstimated base costWhy this route makes sense
Small podcast archive500gpt-4o-mini-transcribe$1.50Clean spoken audio usually makes the cheaper model worth testing first
Research calls3,000gpt-4o-mini-transcribe or gpt-4o-transcribe$9 to $18Use mini for clean calls and route difficult recordings upward
SaaS transcription feature30,000Hybrid mini plus higher-quality fallback$90 to $180At this volume, routing rules matter more than picking one model blindly
Live captioning product30,000 live minutesgpt-realtime-whisper or a specialist realtime providerAround $510Realtime output changes the cost model and the architecture
Enterprise call workflow300,000Compare OpenAI, Deepgram, AssemblyAI, Google, AWS and self-hostingDepends on latency, governance and feature needsVendor approval, data controls, diarisation and review costs can outweigh the headline rate

The SaaS example is where many teams make the wrong call. They pick the cheapest transcription API, then run every transcript through expensive summarisation, entity extraction and customer-facing formatting. At scale, the second model call can cost more than transcription itself.

What the base price does not include

The per-minute transcription price is only the visible part of the bill. In a real workflow, several surrounding costs decide whether Whisper is actually cheap.

Speaker diarisation

Plain transcription tells you what was said. Diarisation tries to identify who spoke when. That difference matters for interviews, meeting notes, sales calls, support conversations and podcasts with multiple hosts.

Classic Whisper-style transcription is not the same as a polished meeting transcript with reliable “Speaker 1” and “Speaker 2” labels. You may need a newer speaker-aware OpenAI route, a separate diarisation model, a provider with built-in speaker labelling or post-processing logic. Price that before you compare vendors.

Audio preprocessing

Long video files often need audio extraction. Multi-channel calls may need downmixing. Quiet recordings may need normalisation. Some files need chunking before upload. None of that is difficult for a capable developer using FFmpeg and queues, but it is still engineering work.

Preprocessing is also where cost control can start. A 60-minute recording with 15 minutes of dead air should not always be sent as a 60-minute file. Trim silence where the workflow allows, but test carefully so you do not cut off quiet speech or speaker transitions.

Retries and failed jobs

Retries can quietly double costs if your pipeline resubmits the same file after a timeout. Use job IDs, idempotency logic, transcript status fields and clear failure states. The cheap API call is not the problem. Duplicate jobs, partial records and poor retry handling are the problem.

Downstream AI calls

Most products do not stop at raw text. They generate summaries, chapters, action points, CRM notes, search indexes, redactions, topics or structured JSON. Once you pass the transcript into another AI model, transcription is just the first line item.

When $0.003 per minute is enough

gpt-4o-mini-transcribe should be the first model to test when volume matters and the audio is relatively clean. It is a strong fit for internal voice notes, clear podcast files, lecture recordings, simple dictation, content archives and non-critical searchable transcripts.

The key word is “test”. Do not move production traffic to the cheapest route because a pricing table looks attractive. Test the model against files that include real microphones, real room noise, real accents and real vocabulary. If errors are minor and the transcript is not being used for compliance-sensitive decisions, the $0.003 route can be the best value.

When to pay for the higher transcription model

Use gpt-4o-transcribe when mistakes are expensive. That includes customer-facing subtitles, legal or medical-adjacent notes, technical calls, investor interviews, research transcripts, product names, addresses, numbers and workflows where humans will trust the output without reading the audio again.

A higher transcription bill can be cheaper than a correction workflow. If a transcript error causes a support agent to misread a customer complaint or a researcher to miss a key statement, the saving from mini disappears quickly. This is why the right test is not only word error rate. Track proper nouns, numbers, speaker turns, timestamps, acronyms and domain terms.

When live transcription changes the price comparison

Realtime transcription is not batch transcription with a faster response. It is a different interaction pattern. Live captions, voice agents, call assist tools and browser microphone workflows need transcript deltas while audio is still arriving. That is why gpt-realtime-whisper belongs in the pricing table rather than being treated as a footnote.

Use caseBetter pricing comparisonWhat to test
Uploading a podcast filegpt-4o-mini-transcribe vs gpt-4o-transcribeAccuracy, punctuation, names, editing time and turnaround speed
Transcribing meetings after they finishBatch file transcription plus diarisation optionsSpeaker labels, timestamps, summary cost and export format
Live captionsgpt-realtime-whisper vs specialist streaming providersLatency, partial-text corrections, stability and cost per live hour
Voice agentsRealtime transcription plus speech generation and LLM costsEnd-to-end latency, barge-in behaviour, turn detection and failure recovery
Live translationRealtime translation pricing, not batch Whisper pricingLanguage pairs, delay tolerance, accuracy and moderation needs

The practical rule is straightforward: use batch pricing when the user can wait; use realtime pricing when the product experience depends on words appearing during the session.

OpenAI Whisper pricing: pros and cons

ProsCons
Clear low-cost managed route with gpt-4o-mini-transcribe at about $0.003 per minute.The cheapest model still needs testing on noisy, accented or domain-heavy recordings.
Strong accuracy profile in DIY AI’s 2026 speech-to-text dataset, with OpenAI Whisper scoring 9.2/10 overall.Classic whisper-1 is not the newest path for every managed OpenAI transcription workflow.
Good fit when transcripts feed into OpenAI-based summarisation, extraction or classification.Realtime transcription costs materially more than standard file transcription.
Self-hosted Whisper remains an option for teams that need control and have GPU capacity.Self-hosting shifts cost into engineering, infrastructure, monitoring and maintenance.
Simple base pricing makes early forecasting easier than many feature-heavy speech platforms.Diarisation, retries, preprocessing, redaction and downstream AI calls can change the total cost.

Cheaper alternatives to OpenAI Whisper API

The first cheaper alternative is still inside OpenAI: gpt-4o-mini-transcribe. It cuts the familiar $0.006-per-minute benchmark in half for standard transcription, which makes it the obvious first test for clean batch audio.

Outside OpenAI, cheaper does not always mean better. A lower per-hour rate can lose its advantage if you need extra diarisation, redaction, streaming, analytics or human correction. Use the following comparison as a shortlist, then verify live vendor pricing before migration.

ProviderDIY AI overall scoreStar ratingBest fitPricing pattern to watch
OpenAI Whisper / OpenAI transcription9.2/104.6/5High-accuracy batch transcription and OpenAI ecosystem workflowsVery competitive batch pricing, higher cost for realtime
Deepgram9.1/104.6/5Realtime streaming and voice AI agentsModel, streaming and feature choices can change the real bill
AssemblyAI9.0/104.5/5Speech intelligence, summaries, chapters and structured audio insightsAudio intelligence add-ons can matter more than base transcription
Speechmatics8.8/104.4/5Global accents, non-native English and multilingual speechWorth comparing where accent coverage is more important than the lowest rate
Google Gemini Flash STT / Google Speech-to-Text8.6/104.3/5Google Cloud teams and multimodal workflowsCloud fit, governance and batch options can outweigh simple per-minute comparisons
AWS Transcribe8.0/104.0/5AWS-native data pipelines and contact-centre workflowsRegional pricing, add-ons and minimum charges need checking
Rev AI7.9/104.0/5Hybrid AI and human-reviewed transcription workflowsUseful where human verification matters more than raw API cost

Deepgram deserves early testing when the product is realtime or voice-agent-led. AssemblyAI is stronger when the transcript needs built-in intelligence rather than a plain text file. Speechmatics is a good shortlist option for accents. Google and AWS make sense when the organisation is already committed to those clouds and procurement is easier inside an existing vendor stack.

For a deeper look at live transcription trade-offs, read our Deepgram review. For the Google side of the comparison, see our Google Speech-to-Text review.

OpenAI Whisper vs Deepgram, AssemblyAI, Google and AWS

OpenAI is the best default when you want accurate batch transcripts and the text will feed into other OpenAI workflows. Deepgram is the stronger first test for low-latency streaming. AssemblyAI is often cleaner when you want transcript intelligence without building every post-processing step yourself. Google and AWS are rarely chosen on raw price alone; they are chosen because the wider cloud environment is already approved.

Decision factorOpenAI transcriptionDeepgramAssemblyAIGoogle / AWS
Batch file transcriptionStrong defaultStrongStrongStrong in cloud-native environments
Realtime streamingUse gpt-realtime-whisper and test latencyExcellent fitGood fit depending on workflowUseful for cloud-integrated systems
Speech intelligenceUsually built with additional OpenAI processingAvailable through platform featuresCore strengthDepends on connected cloud services
Enterprise governanceGood, but check data and retention requirementsGood for speech-first teamsGood for product teams needing audio intelligenceStrongest when procurement already favours the cloud vendor
Lowest simple batch costgpt-4o-mini-transcribe is highly competitiveCompetitiveCompetitiveCan be competitive for dynamic batch or scale tiers, but not always simple

Do not let a pricing table make the whole decision. A transcript that arrives too slowly, misses speaker turns, mangles proper nouns or creates more editing work is not cheap in practice.

Self-hosted Whisper: when it is actually cheaper

Self-hosted Whisper is attractive because there is no per-minute OpenAI API bill. That does not make it free. You still pay for GPU time, queue workers, monitoring, storage, security, model updates, deployment work and developer time.

It can be cheaper when volume is high, demand is steady and the team already understands GPU infrastructure. It is less attractive for a small product with uneven usage, where managed API pricing keeps complexity low and lets developers focus on the product.

A realistic self-hosting estimate needs named assumptions: GPU type, processing speed, utilisation, file length, queue latency, storage costs, engineering time and failure handling. Avoid publishing a single “$0.50” style estimate unless the calculation is fully explained and reproducible.

How to reduce transcription costs without hurting accuracy

The best savings usually come from routing and preprocessing rather than switching vendors every time a price changes. A sensible workflow looks like this:

  • Trim long silence before upload where it is safe to do so.
  • Extract audio from video before transcription instead of pushing large video files through your pipeline.
  • Split long recordings on silence or sentence boundaries, not arbitrary timestamps.
  • Send clean, low-risk recordings to gpt-4o-mini-transcribe first.
  • Route noisy, high-value or compliance-sensitive audio to the higher-quality model.
  • Use job IDs and idempotency checks so retries do not create duplicate transcription costs.
  • Only run expensive summarisation, redaction or extraction on transcripts that need it.
  • Keep a small regression set of real audio files and retest when you change models.

That last point matters. Models change, aliases move and provider behaviour can shift. A simple 20-file evaluation set with known problem cases will catch more issues than a beautiful spreadsheet with no audio behind it.

Whisper accuracy and multilingual performance: what you are paying for

OpenAI Whisper remains the top-ranked provider in DIY AI’s 2026 speech-to-text dataset, with a 9.2/10 overall score. The category score is not based on price alone. It reflects accuracy, speed, speaker detection, punctuation, diarisation, noise robustness, export formats, cost efficiency and realtime streaming.

MetricOpenAI Whisper scorePractical meaning
Accuracy9.6/10Strong results on clear speech, interviews, podcasts and many real-world recordings
Noise robustness9.4/10Useful when microphone quality, room noise or recording conditions are imperfect
Speaker detection9.4/10Strong within the wider OpenAI transcription stack, but still test multi-speaker files
Punctuation9.2/10More readable long-form transcripts with less cleanup
Diarisation9.2/10Good score, but model and endpoint support must be checked before launch
Cost efficiency8.8/10Strong value for batch transcription, especially with the mini model route
Realtime streaming8.8/10Better now with realtime options, though dedicated speech platforms remain strong

Accuracy is not evenly distributed across every language, accent and recording condition. High-resource languages usually perform better. Low-resource languages, code-switching, noisy multi-speaker rooms and domain-specific vocabulary need direct testing. For a deeper quality discussion, see our OpenAI Whisper review.

Does ChatGPT Plus include OpenAI Whisper API usage?

No. ChatGPT subscriptions and OpenAI API usage are billed separately. ChatGPT Plus may include voice or file features inside the ChatGPT product, but that is not the same as using the API to transcribe files inside your own app, automation or SaaS workflow.

This is a common source of pricing confusion. A user can pay for ChatGPT Plus and still need separate API billing for programmatic transcription. Treat the API as its own usage-based bill.

OpenAI Whisper pricing FAQs

How much does OpenAI Whisper cost per minute in 2026?

The familiar OpenAI Whisper pricing figure is $0.006 per minute, equal to $0.36 per audio hour. In 2026, you should also compare gpt-4o-mini-transcribe at about $0.003 per minute, gpt-4o-transcribe at about $0.006 per minute and gpt-realtime-whisper at about $0.017 per minute for live transcription.

Is OpenAI Whisper API free?

No, the managed OpenAI API is paid. The open-source Whisper model can be run without OpenAI API fees, but self-hosting still has compute, storage, monitoring and engineering costs. Any trial credits should be treated as temporary evaluation budget, not a long-term free tier.

Is gpt-4o-mini-transcribe cheaper than whisper-1?

Yes, based on the current pricing structure, gpt-4o-mini-transcribe is about $0.003 per minute, while whisper-1 is commonly modelled at $0.006 per minute. Test quality before switching production traffic.

What is the latest OpenAI Whisper model in 2026?

For OpenAI’s managed API, the better 2026 framing is not just “latest Whisper”. Compare whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe and gpt-realtime-whisper. For open-source deployments, teams often compare Whisper variants and optimised implementations separately from the managed OpenAI API.

Does OpenAI bill per second or per minute for Whisper?

For budget planning, model transcription by the duration of audio you upload. If a file is 30 minutes long, forecast against 30 minutes unless your pipeline trims silence or splits the file before upload. Always check the current provider billing rules before launch.

Does OpenAI Whisper charge for silence?

Plan as though uploaded audio duration matters, not only spoken-word duration. If your files contain long gaps, silence trimming can reduce cost, but test carefully so you do not remove quiet speech, pauses that matter, or speaker transitions.

Do retries cost extra?

They can. If your application resubmits the same audio after a timeout or failed status check, you may create duplicate work. Use clear job states, idempotency logic and retry limits.

Does the OpenAI Whisper price include summarisation?

No. Transcription pricing covers the speech-to-text step. Summaries, chapters, action points, CRM notes, redaction, classification and structured extraction usually require separate processing.

Is Deepgram cheaper than OpenAI Whisper?

It depends on the model, streaming mode and features you compare. OpenAI’s gpt-4o-mini-transcribe is very competitive for low-cost batch transcription. Deepgram is often more compelling when realtime streaming and voice-agent latency are the main requirements.

Should I use OpenAI Whisper or self-host Whisper?

Use the managed API when you want low operational overhead, predictable integration and faster shipping. Consider self-hosting when volume is high, privacy or control requirements justify it, and your team can maintain GPU infrastructure.

Verdict: which transcription route should you choose?

OpenAI Whisper pricing is still attractive, but the best answer in 2026 is no longer “$0.006 per minute”. The better decision is to split your workflow into cheap batch, higher-quality batch, live realtime and self-hosted cases.

Use gpt-4o-mini-transcribe first for clean, high-volume batch audio. Use gpt-4o-transcribe when transcript quality matters more than shaving fractions of a cent per minute. Keep whisper-1 where existing integrations are stable and there is no strong reason to migrate. Use gpt-realtime-whisper only when live transcript deltas matter to the product experience. Compare Deepgram early for voice agents and realtime speech products. Consider self-hosted Whisper only when the operational burden is justified by scale or control.

The page-level takeaway is simple: price the full workflow. Audio minutes are only the starting point. Diarisation, preprocessing, failed jobs, transcript cleanup, downstream AI calls and human correction decide whether the cheapest line in the table is actually the cheapest route.

You Might Also Like:

Best AI Speech To Text Tools 2026

By: Steven Jones On:
Updated on: May 17, 2026
The best AI speech-to-text tools in 2026 are no longer judged by word error rate alone. Accuracy still matters, but…

OpenAI Whisper Review 2026

By: Steven Jones On:
Updated on: May 22, 2026
OpenAI Whisper remains one of the most important speech-to-text systems in 2026, especially for teams that want high accuracy, open-source…

Deepgram Pricing 2026

By: Steven Jones On:
Deepgram pricing in 2026 is usage-based, with separate rates for speech-to-text, real-time streaming, Flux, Aura text-to-speech, Voice Agent API usage,…
Steven Jones

Writer: Steven Jones

AI Tools Reviewer and Technical Analyst

Steven Jones is a technology analyst specialising in artificial intelligence, machine learning workflows, and emerging automation tools. At DIY AI, he focuses on clear, practical guidance for people comparing AI tools in the real world. His work covers text generation, image generation, video tools, data platforms, developer-focused AI products, and the automation workflows that connect them. Steven's reviews are built around hands-on testing, practical benchmarks, and transparent scoring rather than vendor claims. He looks closely at where each tool performs well, where it falls short, and what those trade-offs mean for creators, teams, and businesses trying to make sensible AI adoption decisions. He has a particular interest in safety, reliability, output quality, performance metrics, and dataset quality. When he is not reviewing the latest AI model updates, he experiments with prompt engineering techniques and contributes to DIY AI ongoing work on fair, explainable scoring frameworks for AI tools.

Contact

Leave a Comment On: OpenAI Whisper API Pricing

Your email address will not be published.