β Back to Overview
ποΈ Speech Transcription
Real-time speech-to-text transcription with advanced audio processing, meeting minutes generation, and comprehensive export options.
Overview
The Speech Transcription feature provides two powerful tools for converting speech to text:
π
Speech Transcription
General-purpose real-time transcription with sensitive data detection. Perfect for notes, dictation, and quick recordings.
π₯
Meeting Transcription
Specialised for meetings with automatic minutes generation, action items extraction, and participant tracking.
π‘ Privacy Reminder: This feature demonstrates what's possible when an app has microphone access. It's a practical tool, but also a reminder to be mindful of which apps you grant this permission to. All processing happens on-device.
1. Speech Transcription
Services: SpeechTranscriptionService, SensitiveInfoDetector, TranscriptStorageService
Features
- Real-time speech-to-text using Apple's Speech framework
- On-device recognition for privacy (when available)
- Support for 50+ languages with automatic detection
- Continuous transcription that grows as you speak
- Auto-save every 30 seconds to prevent data loss
- Pause and resume without losing progress
Sensitive Data Detection
Automatically detects and flags potentially sensitive information:
- Credit Card Numbers β 16-digit card patterns
- National ID Numbers β Norwegian fΓΈdselsnummer (11 digits)
- Phone Numbers β 8+ digit sequences
- Email Addresses β Standard email patterns
- Passwords β Text following "password:", "PIN:", etc.
Live Statistics
- Word count and character count
- Words per minute (speaking rate)
- Session duration
- Average confidence score
- Sensitive segment count
Audio Controls
- Microphone Gain Slider β Manual adjustment of input sensitivity (0.5x to 2.0x)
- AGC (Automatic Gain Control) β Automatically adjusts gain for optimal levels
- Delta SNR Indicator β Shows real-time change in signal-to-noise ratio
2. Meeting Transcription
Services: MeetingMinutesService, TranscriptExportService
Meeting Setup
- Meeting title and location
- Participant list management
- Recording mode selection
- Document sensitivity level (Personal, Public, Internal, Confidential)
- Keep screen on during recording
Recording Modes
| Mode |
Best For |
Description |
| Standard |
General use |
Default transcription settings |
| Meeting |
Group discussions |
Optimised for multiple speakers |
| Interview |
Two-person conversations |
Balanced for dialogue |
| Lecture |
Presentations |
Single speaker, long duration |
Automatic Meeting Minutes
The app analyses your transcript and generates structured meeting minutes:
- Summary β Automatic overview of the meeting content
- Key Points β Important topics identified by keywords
- Decisions β Statements containing "decided", "agreed", "approved"
- Action Items β Tasks with assignee and priority detection
Action Item Detection
Automatically extracts action items based on keywords like:
- "action item", "todo", "task", "will do", "need to"
- "should", "must", "have to", "assigned to"
- "follow up", "deadline", "responsible for"
Priority is determined by urgency keywords (urgent, critical, ASAP = High; when possible, eventually = Low).
3. Advanced Audio Processing
Service: AdaptiveLMSFilter, AudioEngineHandler
Two mutually exclusive audio enhancement modes to improve transcription quality:
π
Denoise (LMS Filter)
Adaptive Least Mean Squares filter that learns and removes background noise in real-time. Uses vDSP-accelerated processing for minimal CPU impact.
- 64-tap FIR filter (~2.9ms at 22kHz)
- Conservative learning rate (Ξ² = 0.003) to preserve speech
- Leakage factor and coefficient clamping for stability
- Adapts to your environment using white noise reference
- Removes constant noise (fans, AC, traffic)
π€
Clarity (Voice Enhancement)
6-band parametric EQ optimised for both male and female voice frequencies.
- High-pass at 80Hz (removes sub-bass rumble)
- Boost at 220Hz (voice fundamentals, male + female)
- Boost at 700Hz (first formant, male voices)
- Boost at 1kHz (first formant, female voices)
- Boost at 2.8kHz (articulation and consonants)
- Low-pass at 8kHz (removes high-frequency hiss)
Combined Mode: Denoise and Clarity can be used simultaneously for maximum audio quality. When both are active, the signal flows through the EQ first (voice shaping), then through the LMS filter (noise removal). This combined processing chain delivers the best results in noisy environments.
Signal-to-Noise Ratio (SNR) Indicator
Live SNR display shows audio quality during recording:
| SNR Level |
Quality |
Colour |
| < 10 dB |
Poor (noisy environment) |
Red |
| 10-20 dB |
Acceptable |
Orange |
| 20-30 dB |
Good |
Yellow |
| > 30 dB |
Excellent |
Green |
4. Export & Sharing
Service: TranscriptExportService
Export Formats
| Format |
Extension |
Best For |
| Plain Text |
.txt |
Simple sharing, compatibility |
| Markdown |
.md |
Documentation, formatted notes |
| PDF |
.pdf |
Professional documents, email attachments |
| JSON |
.json |
Data processing, backups |
Sharing Options
- Share via iOS share sheet (AirDrop, Messages, Mail, etc.)
- Copy to clipboard with one tap
- Export meeting minutes separately from transcript
- PDF generation with professional formatting
Meeting Minutes Export
Meeting minutes can be exported with full formatting:
- Header with meeting details (title, date, duration, participants)
- Summary section
- Key points as bullet list
- Decisions with checkmarks
- Action items table with priority, assignee, and status
- Full transcript appendix
Corporate PDF Export
Professional PDF design for business use:
- Dark blue header stripe with meeting title
- Teal accent colours for section headers
- Sensitivity watermark with logo (when set)
- Clean white background
- Proper typography and spacing
5. Search & Analysis
Search Functionality
- Full-text search across entire transcript
- Case-insensitive matching
- Highlighted search results
- Match count display
- Search works on all segments (growing document)
Statistics Dashboard
- Duration β Total recording time
- Word Count β Total words transcribed
- Character Count β Total characters
- Words per Minute β Speaking rate
- Average Confidence β Recognition accuracy (0-100%)
- Sensitive Segments β Count of flagged content
- Language β Selected transcription language
Document Behaviour
Growing Document: The transcript is a continuous document that grows over time. Text never disappears β it only accumulates. When recognition restarts (due to Apple's 1-minute limit), all previous text is preserved and new text is appended.
6. Technical Implementation
Core Technologies
AVAudioEngine β Real-time audio capture and processing
SFSpeechRecognizer β Apple's on-device speech recognition
AVAudioUnitEQ β 6-band parametric equaliser
- Custom 64-tap LMS adaptive filter with vDSP acceleration (Accelerate framework)
Swift 6 Concurrency
@MainActor for UI-related service classes
@unchecked Sendable with NSLock for thread-safe audio processing
nonisolated methods for cross-actor operations
- Async/await for permission requests and recognition tasks
Recognition Handling
- Automatic restart when Apple's recognition times out (~1 minute)
- Pending text buffer to prevent loss during restarts
- Partial results shown in real-time, converted to final on completion
- Duplicate detection to prevent repeated segments