⚠️ Development Phase: This project is currently in active development and is not yet ready for production use. Features may be incomplete, unstable, or subject to change.
OpenVoice Flow is a cross-platform, open-source, real-time voice transcription app designed to deliver instant, context-aware speech-to-text functionality. Inspired by Whisper Flow, OpenVoice Flow aims to be developer-friendly, privacy-first, and extensible—with support for local and cloud-based providers under a bring-your-own-key model.
- 🔊 Real-time voice transcription with low latency
- 🧠 Multi-provider support (Soniox, Whisper/Ollama, future providers)
- 💻 Runs anywhere: Windows, macOS, Linux (Tauri powered)
- 🧩 Modular provider architecture, BYO-API-key
- 🔐 Privacy-first: data stays on device, local key storage (unencrypted in MVP)
- 📝 Clipboard or text field insertion
- 🔁 History view with audio and re-transcription
- ⚙️ Personal dictionary, instructions, formatting rules
- 📊 Network diagnostics per provider (latency, stats)
- 🔥 Floating overlay for live feedback
| Dashboard | History | Stats |
|---|---|---|
![]() |
![]() |
![]() |
| General | Audio | Output | Key Bindings |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| macOS Tray | Overlay Ready | Overlay Listening | Overlay Captured |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Built with:
Taurifor native cross-platform shell (Rust backend, Webview UI)React + TypeScriptfrontendRustmodules for audio capture, streaming, and provider interfacing
Component Layers:
Audio Engine: mic access, noise filtering, VAD, audio chunkingProvider Abstraction: clean API interface to any STT backendTranscription Overlay: animated, floating UI with real-time updatesStorage Layer: SQLite + file storage for transcript and optional audioSettings & Stats: global hotkeys, language, clipboard, API keys, diagnostics
- Launch app (tray menu by default)
- Press your configured global hotkey (e.g., Ctrl+Shift+V)
- Speak—watch the floating overlay update in real time
- Stop—text is inserted into your cursor location or copied to clipboard
- Access saved transcriptions from History panel
openvoiceflow/
├── src-tauri/ # Tauri (Rust) backend
│ ├── src/
│ │ ├── main.rs # Entry point
│ │ ├── commands.rs # Frontend ↔ backend bridge (Tauri)
│ │ ├── state.rs # Application state management
│ │ ├── settings.rs # Settings handling
│ │ ├── credentials.rs # API key storage
│ │ ├── hotkeys.rs # Global hotkey handling
│ │ ├── tray.rs # System tray menu
│ │ ├── types.rs # Shared types
│ │ ├── constants.rs # Constants
│ │ ├── audio/ # Audio capture and processing
│ │ │ ├── engine.rs # Audio engine
│ │ │ ├── worker.rs # Audio worker
│ │ │ └── mod.rs
│ │ ├── providers/ # STT provider implementations
│ │ │ ├── mod.rs
│ │ │ ├── openai.rs # OpenAI Whisper
│ │ │ ├── ollama.rs # Ollama
│ │ │ ├── soniox.rs # Soniox
│ │ └── storage/ # Local data storage
│ │ ├── mod.rs
│ │ ├── sqlite.rs # SQLite database
│ │ └── cleanup.rs # Cache cleanup
│ ├── Cargo.toml # Rust dependencies
│ ├── tauri.conf.json # Tauri configuration
│ └── build.rs # Build script
├── src/ # React frontend
│ ├── main.tsx # Entry point
│ ├── App.tsx # Main app component
│ ├── api.ts # API client
│ ├── components/ # React components
│ │ ├── Controls.tsx # Recording controls
│ │ ├── History.tsx # History component
│ │ ├── HistoryPanel.tsx # History panel
│ │ ├── Overlay.tsx # Overlay component
│ │ ├── OverlayWindow.tsx # Overlay window
│ │ ├── SettingsPanel.tsx # Settings panel
│ │ ├── StatsPanel.tsx # Stats panel
│ │ └── Toggle.tsx # Toggle component
│ └── styles/
│ └── app.css # Global styles
├── index.html # HTML template
├── package.json # Node.js dependencies
├── tsconfig.json # TypeScript configuration
├── vite.config.ts # Vite configuration
├── README.md # Overview and developer intro
└── .gitignore # Git ignore rules
- Node.js 18+
- Rust toolchain (stable)
- Tauri prerequisites (platform-specific)
npm install
npm run tauri devnpm run tauri buildAPI keys are stored locally in the app config directory in plain JSON (unencrypted).
Default locations:
- macOS:
~/Library/Application Support/com.openvoice.flow/credentials.json - Windows:
%APPDATA%\\com.openvoice.flow\\credentials.json - Linux:
~/.config/com.openvoice.flow/credentials.json
SQLite history lives in the app data directory as openvoice-flow.sqlite. Audio files (if enabled) are stored under audio/.
Raw request/response payloads (when available) are persisted alongside each transcript. Debug mode controls visibility in the History view.
Audio cache cleanup runs on a configurable interval (default 15 minutes) and removes old or oversized audio files based on:
- Max cache size (MB)
- Max cache age (hours)
An append-only debug log is written to app_data/debug/transcripts.jsonl for every transcription.
OpenVoice Flow supports two recording modes controlled by global hotkeys:
| Mode | Description |
|---|---|
| Toggle | Press hotkey to start recording, press again to stop |
| Push-to-Talk (PTT) | Hold hotkey to record, release to stop |
| Both | Both hotkeys are active simultaneously |
Toggle Shortcut (for Toggle mode):
- Must use a modifier key:
Cmd/Ctrl + key,Alt + key, orShift + key - Or use a function key alone:
F1throughF12 - Examples:
CmdOrCtrl+Shift+V(default),Alt+R,F8
Push-to-Talk Key (for PTT mode):
- Function keys:
F1throughF12 - Special keys:
Space,Enter,Tab,Esc - Examples:
F9(default),Space
Note: Regular alphanumeric keys (like
A,7) cannot be used alone as they would interfere with normal typing. They must be combined with a modifier key.
- Go to Settings > Key Bindings
- Click the Capture button next to the hotkey you want to change
- Press your desired key combination
- The hotkey will be saved automatically
- Microphone access is required for recording.
- macOS may prompt for Input Monitoring (global hotkeys) and Accessibility (auto-paste).
- Windows/Linux auto-paste may require additional system permissions depending on security settings.
- Soniox: implemented and properly tested (requires API key)
- OpenAI Whisper: implemented but not yet tested (requires API key)
- Ollama: experimental (expects
/api/transcribeendpoint)
- Platform Support: The app is currently only tested and built with macOS. Windows and Linux support is planned but not yet verified.
- Provider Testing: Only Soniox has been properly tested. Other providers (OpenAI Whisper, Ollama) are implemented but have not been tested yet.
- macOS Overlay: The floating overlay does not work correctly in full-screen mode on macOS. This is a known limitation due to macOS window management restrictions.
- Credentials: API keys are stored in plain text JSON format without encryption. This is intended for development purposes only.
This project is licensed under the MIT License.
Copyright (c) 2025 OpenVoice Flow Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.










