Real-time speech-to-text transcription with advanced audio processing, meeting minutes generation, and comprehensive export options.
Overview
The Speech Transcription feature provides two powerful tools for converting speech to text:
π
Speech Transcription
General-purpose real-time transcription with sensitive data detection. Perfect for notes, dictation, and quick recordings.
π₯
Meeting Transcription
Specialised for meetings with automatic minutes generation, action items extraction, and participant tracking.
π‘ Privacy Reminder: This feature demonstrates what's possible when an app has microphone access. It's a practical tool, but also a reminder to be mindful of which apps you grant this permission to. All processing happens on-device.
Two audio enhancement modes that can be used simultaneously for maximum quality:
π
Denoise (LMS Filter)
Adaptive Least Mean Squares filter that learns and removes background noise in real-time. Uses vDSP-accelerated processing for minimal CPU impact.
64-tap FIR filter (~2.9ms at 22kHz)
Conservative learning rate (Ξ² = 0.003) to preserve speech
Leakage factor and coefficient clamping for stability
Adapts to your environment using white noise reference
Removes constant noise (fans, AC, traffic)
π€
Clarity (Voice Enhancement)
6-band parametric EQ optimised for both male and female voice frequencies.
High-pass at 80Hz (removes sub-bass rumble)
Boost at 220Hz (voice fundamentals, male + female)
Boost at 700Hz (first formant, male voices)
Boost at 1kHz (first formant, female voices)
Boost at 2.8kHz (articulation and consonants)
Low-pass at 8kHz (removes high-frequency hiss)
Combined Mode: All three filters (Denoise, Clarity, and AGC) can be active simultaneously for maximum audio quality. When combined, the signal flows through the EQ first (voice shaping), then through the LMS filter (noise removal), with AGC applied at the input stage. This combined processing chain delivers the best results in noisy environments. Filters can be toggled on and off during recording without interruption.
Signal-to-Noise Ratio (SNR) Indicator
Live SNR display shows audio quality during recording:
SNR Level
Quality
Colour
< 10 dB
Poor (noisy environment)
Red
10-20 dB
Acceptable
Orange
20-30 dB
Good
Yellow
> 30 dB
Excellent
Green
π See also:FAQ β Speech Transcription β detailed explanation of how the Denoise, Clarity, and AGC audio filters work.
Average Confidence β Recognition accuracy (0-100%)
Sensitive Segments β Count of flagged content
Language β Selected transcription language
Document Behaviour
Growing Document: The transcript is a continuous document that grows over time. Text never disappears β it only accumulates. When recognition restarts (due to Apple's 1-minute limit), all previous text is preserved and new text is appended.