← Back to Overview

πŸŽ™οΈ Speech Transcription

Real-time speech-to-text transcription with advanced audio processing, meeting minutes generation, and comprehensive export options.

Overview

The Speech Transcription feature provides two powerful tools for converting speech to text:

πŸ“

Speech Transcription

General-purpose real-time transcription with sensitive data detection. Perfect for notes, dictation, and quick recordings.

πŸ‘₯

Meeting Transcription

Specialised for meetings with automatic minutes generation, action items extraction, and participant tracking.

πŸ’‘ Privacy Reminder: This feature demonstrates what's possible when an app has microphone access. It's a practical tool, but also a reminder to be mindful of which apps you grant this permission to. All processing happens on-device.
↑ Back to top

1. Speech Transcription

Services: SpeechTranscriptionService, SensitiveInfoDetector, TranscriptStorageService

Features

Sensitive Data Detection

Automatically detects and flags potentially sensitive information:

Live Statistics

Audio Controls

↑ Back to top

2. Meeting Transcription

Services: MeetingMinutesService, TranscriptExportService

Meeting Setup

Recording Modes

Mode Best For Description
Standard General use Default transcription settings
Meeting Group discussions Optimised for multiple speakers
Interview Two-person conversations Balanced for dialogue
Lecture Presentations Single speaker, long duration

Automatic Meeting Minutes

The app analyses your transcript and generates structured meeting minutes:

Action Item Detection

Automatically extracts action items based on keywords like:

Priority is determined by urgency keywords (urgent, critical, ASAP = High; when possible, eventually = Low).

↑ Back to top

3. Advanced Audio Processing

Service: AdaptiveLMSFilter, AudioEngineHandler

Two audio enhancement modes that can be used simultaneously for maximum quality:

πŸ”‡

Denoise (LMS Filter)

Adaptive Least Mean Squares filter that learns and removes background noise in real-time. Uses vDSP-accelerated processing for minimal CPU impact.

  • 64-tap FIR filter (~2.9ms at 22kHz)
  • Conservative learning rate (Ξ² = 0.003) to preserve speech
  • Leakage factor and coefficient clamping for stability
  • Adapts to your environment using white noise reference
  • Removes constant noise (fans, AC, traffic)
🎀

Clarity (Voice Enhancement)

6-band parametric EQ optimised for both male and female voice frequencies.

  • High-pass at 80Hz (removes sub-bass rumble)
  • Boost at 220Hz (voice fundamentals, male + female)
  • Boost at 700Hz (first formant, male voices)
  • Boost at 1kHz (first formant, female voices)
  • Boost at 2.8kHz (articulation and consonants)
  • Low-pass at 8kHz (removes high-frequency hiss)
Combined Mode: All three filters (Denoise, Clarity, and AGC) can be active simultaneously for maximum audio quality. When combined, the signal flows through the EQ first (voice shaping), then through the LMS filter (noise removal), with AGC applied at the input stage. This combined processing chain delivers the best results in noisy environments. Filters can be toggled on and off during recording without interruption.

Signal-to-Noise Ratio (SNR) Indicator

Live SNR display shows audio quality during recording:

SNR Level Quality Colour
< 10 dB Poor (noisy environment) Red
10-20 dB Acceptable Orange
20-30 dB Good Yellow
> 30 dB Excellent Green
πŸ“Ž See also: FAQ β€” Speech Transcription β€” detailed explanation of how the Denoise, Clarity, and AGC audio filters work.
↑ Back to top

4. Export & Sharing

Service: TranscriptExportService

Export Formats

Format Extension Best For
Plain Text .txt Simple sharing, compatibility
Markdown .md Documentation, formatted notes
PDF .pdf Professional documents, email attachments
JSON .json Data processing, backups

Sharing Options

Meeting Minutes Export

Meeting minutes can be exported with full formatting:

Corporate PDF Export

Professional PDF design for business use:

↑ Back to top

5. Search & Analysis

Search Functionality

Statistics Dashboard

Document Behaviour

Growing Document: The transcript is a continuous document that grows over time. Text never disappears β€” it only accumulates. When recognition restarts (due to Apple's 1-minute limit), all previous text is preserved and new text is appended.
↑ Back to top

6. Technical Implementation

Core Technologies

Swift 6 Concurrency

Recognition Handling

↑ Back to top