three transcription migrations in three months
We shipped ECHO to government clients while routing their citizens’ voice data through OpenAI’s US servers. Yeah.
Phase 1: OpenAI Whisper API. Fast, reliable, easy to integrate. Problem: we’d made claims about GDPR compliance and data not leaving Europe. Those claims were not true. OpenAI’s API sends data to US servers. For a tool serving Dutch municipalities and European governments, hard stop.
Phase 2: Azure OpenAI. Microsoft hosts in EU regions, we get compliance, done. Migrated everything. Then discovered Azure’s Whisper deployment only supported English transcription. Our customers run multilingual conversations in Dutch, German, French, Croatian. Should have caught that before writing a single line of migration code.
Phase 3: Self-hosted Whisper on RunPod. Deployed faster-whisper-large-v3 on RunPod GPU infrastructure. GDPR compliant, multilingual, under our control. But now we owned the entire transcription stack. Scaling, error handling, retry logic, all of it.
The real pain wasn’t the migrations. It was what happened downstream. Original pipeline was synchronous. Send audio, get text. RunPod is async. Submit a job, poll for status, handle timeouts. That one change broke everything that depended on transcription completing in a predictable timeframe.
# The old world: synchronous, simpletext = transcribe(audio_chunk)process_conversation(text)
# The new world: async, everything is a state machinejob_id = submit_transcription(audio_chunk)# ... later, in a separate task ...result = poll_runpod_status(job_id)if result.status == "completed": process_conversation(result.output)Processing statuses on the dashboard were perpetually stuck. Conversations showed as “processing” forever. Had to build cron jobs to sweep up unfinished conversations. Customers were seeing broken states daily.
Validate your assumptions about vendor capabilities before you migrate, not after. We could have caught the Azure English-only limitation with a single API call before writing any migration code. Could have load-tested RunPod’s async model against our pipeline before switching production.
Eventually stabilized with k6 load testing against RunPod, conversation health monitoring, and proper error handling with circuit breakers. But those three months were brutal.
We were building like a B2C startup while selling to governments. Governments don’t want velocity. They want boring, predictable, reliable systems. That mismatch caused every one of these failures.