How Transcription Works
The three stages that turn your sermon audio into a polished transcript.
3 min read
Transcription in Berea happens in three stages, each building on the last to produce the most accurate, readable result possible.
Stage 1: Live transcription
During a live recording, audio is streamed to Deepgram's WebSocket API in real-time. Words appear on screen within 300 milliseconds of being spoken. This gives you an immediate, usable transcript — even before the recording ends.
When a Wi-Fi or cellular connection isn't available, Berea falls back to Apple's on-device Speech framework, which provides offline transcription with slightly lower accuracy for specialized vocabulary.
Stage 2: Deepgram batch upgrade
Once the recording ends, Berea sends the full audio file to Deepgram's batch API. This produces a significantly more accurate transcript than the streaming version — especially for proper nouns, scripture references, and overlapping speech.
The batch upgrade also adds word-level timestamps to every word in the transcript. These timestamps are what power the synchronized playback feature — tap any word and the audio jumps to that exact moment.
Stage 3: transcript polishing
The final stage sends the transcript to GPT for editing. This step:
- Fixes punctuation, capitalization, and paragraph breaks.
- Corrects common transcription errors ("eye" → "I", "pray fur" → "pray for").
- Standardizes scripture references ("Romans eight" → "Romans 8").
- Applies context from your Faith Profile (denominational vocabulary, pastor's name).
Tip
Transcript accuracy factors
- Microphone proximity — closer is almost always better.
- Room acoustics — reverberant churches can introduce echo artifacts.
- Speaker clarity — accents, fast speech, or heavy background music all reduce accuracy.
- Connection quality — a strong Wi-Fi signal improves the live streaming stage.