β Back to Overview
ποΈ Speech Transcription
Real-time speech-to-text transcription with advanced audio processing, meeting minutes generation, and comprehensive export options.
Overview
The Speech Transcription feature provides two powerful tools for converting speech to text:
π
Speech Transcription
General-purpose real-time transcription with sensitive data detection. Perfect for notes, dictation, and quick recordings.
π₯
Meeting Transcription
Specialised for meetings with automatic minutes generation, action items extraction, and participant tracking.
π‘ Privacy Reminder: This feature demonstrates what's possible when an app has microphone access. It's a practical tool, but also a reminder to be mindful of which apps you grant this permission to. All processing happens on-device.
1. Speech Transcription
Services: SpeechTranscriptionService, SensitiveInfoDetector, TranscriptStorageService
Features
- Real-time speech-to-text using Apple's Speech framework
- On-device recognition for privacy (when available)
- Support for 50+ languages with automatic detection
- Continuous transcription that grows as you speak
- Auto-save every 30 seconds to prevent data loss
- Pause and resume without losing progress
Sensitive Data Detection
Automatically detects and flags potentially sensitive information:
- Credit Card Numbers β 16-digit card patterns
- National ID Numbers β Norwegian fΓΈdselsnummer (11 digits)
- Phone Numbers β 8+ digit sequences
- Email Addresses β Standard email patterns
- Passwords β Text following "password:", "PIN:", etc.
Live Statistics
- Word count and character count
- Words per minute (speaking rate)
- Session duration
- Average confidence score
- Sensitive segment count
Audio Controls
- Microphone Gain Slider β Manual adjustment of input sensitivity (0.5x to 2.0x)
- AGC (Automatic Gain Control) β Automatically adjusts gain for optimal levels
- Delta SNR Indicator β Shows real-time change in signal-to-noise ratio
2. Meeting Transcription
Services: MeetingMinutesService, TranscriptExportService
Meeting Setup
- Meeting title and location
- Participant list management
- Recording mode selection
- Document sensitivity level (Personal, Public, Internal, Confidential)
- Keep screen on during recording
Recording Modes
| Mode |
Best For |
Description |
| Standard |
General use |
Default transcription settings |
| Meeting |
Group discussions |
Optimised for multiple speakers |
| Interview |
Two-person conversations |
Balanced for dialogue |
| Lecture |
Presentations |
Single speaker, long duration |
Automatic Meeting Minutes
The app analyses your transcript and generates structured meeting minutes:
- Summary β Automatic overview of the meeting content
- Key Points β Important topics identified by keywords
- Decisions β Statements containing "decided", "agreed", "approved"
- Action Items β Tasks with assignee and priority detection
Action Item Detection
Automatically extracts action items based on keywords like:
- "action item", "todo", "task", "will do", "need to"
- "should", "must", "have to", "assigned to"
- "follow up", "deadline", "responsible for"
Priority is determined by urgency keywords (urgent, critical, ASAP = High; when possible, eventually = Low).
3. Advanced Audio Processing
Service: AdaptiveLMSFilter, AudioEngineHandler
Two mutually exclusive audio enhancement modes to improve transcription quality:
π
Denoise (LMS Filter)
Adaptive Least Mean Squares filter that learns and removes background noise in real-time.
- 32-tap FIR filter
- Adapts to your environment
- Removes constant noise (fans, AC, traffic)
- Preserves speech frequencies
π€
Clarity (Voice Enhancement)
5-band EQ filter that emphasises voice frequencies for clearer speech recognition.
- High-pass at 200Hz (removes rumble)
- Boost at 400Hz (voice fundamentals)
- Boost at 1.5kHz (clarity)
- Boost at 3kHz (articulation)
- Low-pass at 6kHz (removes hiss)
Note: Denoise and Clarity are mutually exclusive β only one can be active at a time. This saves CPU resources and prevents conflicting audio processing.
Signal-to-Noise Ratio (SNR) Indicator
Live SNR display shows audio quality during recording:
| SNR Level |
Quality |
Colour |
| < 10 dB |
Poor (noisy environment) |
Red |
| 10-20 dB |
Acceptable |
Orange |
| 20-30 dB |
Good |
Yellow |
| > 30 dB |
Excellent |
Green |
4. Export & Sharing
Service: TranscriptExportService
Export Formats
| Format |
Extension |
Best For |
| Plain Text |
.txt |
Simple sharing, compatibility |
| Markdown |
.md |
Documentation, formatted notes |
| PDF |
.pdf |
Professional documents, email attachments |
| JSON |
.json |
Data processing, backups |
Sharing Options
- Share via iOS share sheet (AirDrop, Messages, Mail, etc.)
- Copy to clipboard with one tap
- Export meeting minutes separately from transcript
- PDF generation with professional formatting
Meeting Minutes Export
Meeting minutes can be exported with full formatting:
- Header with meeting details (title, date, duration, participants)
- Summary section
- Key points as bullet list
- Decisions with checkmarks
- Action items table with priority, assignee, and status
- Full transcript appendix
Corporate PDF Export
Professional PDF design for business use:
- Dark blue header stripe with meeting title
- Teal accent colours for section headers
- Sensitivity watermark with logo (when set)
- Clean white background
- Proper typography and spacing
5. Search & Analysis
Search Functionality
- Full-text search across entire transcript
- Case-insensitive matching
- Highlighted search results
- Match count display
- Search works on all segments (growing document)
Statistics Dashboard
- Duration β Total recording time
- Word Count β Total words transcribed
- Character Count β Total characters
- Words per Minute β Speaking rate
- Average Confidence β Recognition accuracy (0-100%)
- Sensitive Segments β Count of flagged content
- Language β Selected transcription language
Document Behaviour
Growing Document: The transcript is a continuous document that grows over time. Text never disappears β it only accumulates. When recognition restarts (due to Apple's 1-minute limit), all previous text is preserved and new text is appended.
6. Technical Implementation
Core Technologies
AVAudioEngine β Real-time audio capture and processing
SFSpeechRecognizer β Apple's on-device speech recognition
AVAudioUnitEQ β 5-band parametric equaliser
- Custom LMS adaptive filter with vDSP optimisation
Swift 6 Concurrency
@MainActor for UI-related service classes
@unchecked Sendable with NSLock for thread-safe audio processing
nonisolated methods for cross-actor operations
- Async/await for permission requests and recognition tasks
Recognition Handling
- Automatic restart when Apple's recognition times out (~1 minute)
- Pending text buffer to prevent loss during restarts
- Partial results shown in real-time, converted to final on completion
- Duplicate detection to prevent repeated segments