OpenVoice Flow

⚠️ Development Phase: This project is currently in active development and is not yet ready for production use. Features may be incomplete, unstable, or subject to change.

OpenVoice Flow is a cross-platform, open-source, real-time voice transcription app designed to deliver instant, context-aware speech-to-text functionality. Inspired by Whisper Flow, OpenVoice Flow aims to be developer-friendly, privacy-first, and extensible—with support for local and cloud-based providers under a bring-your-own-key model.

✨ Features

🔊 Real-time voice transcription with low latency
🧠 Multi-provider support (Soniox, Whisper/Ollama, future providers)
💻 Runs anywhere: Windows, macOS, Linux (Tauri powered)
🧩 Modular provider architecture, BYO-API-key
🔐 Privacy-first: data stays on device, local key storage (unencrypted in MVP)
📝 Clipboard or text field insertion
🔁 History view with audio and re-transcription
⚙️ Personal dictionary, instructions, formatting rules
📊 Network diagnostics per provider (latency, stats)
🔥 Floating overlay for live feedback

📸 Screenshots

Main Application

Dashboard	History	Stats

Settings

General	Audio	Output	Key Bindings

System Tray & Overlay

macOS Tray	Overlay Ready	Overlay Listening	Overlay Captured

🔧 Architecture Overview

Built with:

Tauri for native cross-platform shell (Rust backend, Webview UI)
React + TypeScript frontend
Rust modules for audio capture, streaming, and provider interfacing

Component Layers:

Audio Engine: mic access, noise filtering, VAD, audio chunking
Provider Abstraction: clean API interface to any STT backend
Transcription Overlay: animated, floating UI with real-time updates
Storage Layer: SQLite + file storage for transcript and optional audio
Settings & Stats: global hotkeys, language, clipboard, API keys, diagnostics

🧪 Usage Example

Launch app (tray menu by default)
Press your configured global hotkey (e.g., Ctrl+Shift+V)
Speak—watch the floating overlay update in real time
Stop—text is inserted into your cursor location or copied to clipboard
Access saved transcriptions from History panel

📁 Project Folder Structure

openvoiceflow/
├── src-tauri/                    # Tauri (Rust) backend
│   ├── src/
│   │   ├── main.rs               # Entry point
│   │   ├── commands.rs           # Frontend ↔ backend bridge (Tauri)
│   │   ├── state.rs              # Application state management
│   │   ├── settings.rs           # Settings handling
│   │   ├── credentials.rs        # API key storage
│   │   ├── hotkeys.rs            # Global hotkey handling
│   │   ├── tray.rs               # System tray menu
│   │   ├── types.rs              # Shared types
│   │   ├── constants.rs          # Constants
│   │   ├── audio/                # Audio capture and processing
│   │   │   ├── engine.rs         # Audio engine
│   │   │   ├── worker.rs         # Audio worker
│   │   │   └── mod.rs
│   │   ├── providers/            # STT provider implementations
│   │   │   ├── mod.rs
│   │   │   ├── openai.rs         # OpenAI Whisper
│   │   │   ├── ollama.rs         # Ollama
│   │   │   ├── soniox.rs         # Soniox
│   │   └── storage/              # Local data storage
│   │       ├── mod.rs
│   │       ├── sqlite.rs         # SQLite database
│   │       └── cleanup.rs        # Cache cleanup
│   ├── Cargo.toml                # Rust dependencies
│   ├── tauri.conf.json           # Tauri configuration
│   └── build.rs                  # Build script
├── src/                         # React frontend
│   ├── main.tsx                  # Entry point
│   ├── App.tsx                   # Main app component
│   ├── api.ts                    # API client
│   ├── components/               # React components
│   │   ├── Controls.tsx          # Recording controls
│   │   ├── History.tsx           # History component
│   │   ├── HistoryPanel.tsx      # History panel
│   │   ├── Overlay.tsx           # Overlay component
│   │   ├── OverlayWindow.tsx     # Overlay window
│   │   ├── SettingsPanel.tsx    # Settings panel
│   │   ├── StatsPanel.tsx        # Stats panel
│   │   └── Toggle.tsx            # Toggle component
│   └── styles/
│       └── app.css               # Global styles
├── index.html                   # HTML template
├── package.json                 # Node.js dependencies
├── tsconfig.json                # TypeScript configuration
├── vite.config.ts               # Vite configuration
├── README.md                    # Overview and developer intro
└── .gitignore                   # Git ignore rules

🚀 Development

Prerequisites

Node.js 18+
Rust toolchain (stable)
Tauri prerequisites (platform-specific)

Run (dev)

npm install
npm run tauri dev

Build (unsigned)

npm run tauri build

🔐 Credentials & Storage

API keys are stored locally in the app config directory in plain JSON (unencrypted).

Default locations:

macOS: ~/Library/Application Support/com.openvoice.flow/credentials.json
Windows: %APPDATA%\\com.openvoice.flow\\credentials.json
Linux: ~/.config/com.openvoice.flow/credentials.json

SQLite history lives in the app data directory as openvoice-flow.sqlite. Audio files (if enabled) are stored under audio/.

🧰 Debug Mode & Cache Cleanup

Raw request/response payloads (when available) are persisted alongside each transcript. Debug mode controls visibility in the History view.

Audio cache cleanup runs on a configurable interval (default 15 minutes) and removes old or oversized audio files based on:

Max cache size (MB)
Max cache age (hours)

An append-only debug log is written to app_data/debug/transcripts.jsonl for every transcription.

⌨️ Hotkeys

OpenVoice Flow supports two recording modes controlled by global hotkeys:

Recording Modes

Mode	Description
Toggle	Press hotkey to start recording, press again to stop
Push-to-Talk (PTT)	Hold hotkey to record, release to stop
Both	Both hotkeys are active simultaneously

Hotkey Requirements

Toggle Shortcut (for Toggle mode):

Must use a modifier key: Cmd/Ctrl + key, Alt + key, or Shift + key
Or use a function key alone: F1 through F12
Examples: CmdOrCtrl+Shift+V (default), Alt+R, F8

Push-to-Talk Key (for PTT mode):

Function keys: F1 through F12
Special keys: Space, Enter, Tab, Esc
Examples: F9 (default), Space

Note: Regular alphanumeric keys (like A, 7) cannot be used alone as they would interfere with normal typing. They must be combined with a modifier key.

Capturing Hotkeys

Go to Settings > Key Bindings
Click the Capture button next to the hotkey you want to change
Press your desired key combination
The hotkey will be saved automatically

✅ Permissions Notes

Microphone access is required for recording.
macOS may prompt for Input Monitoring (global hotkeys) and Accessibility (auto-paste).
Windows/Linux auto-paste may require additional system permissions depending on security settings.

📡 Provider Status

Soniox: implemented and properly tested (requires API key)
OpenAI Whisper: implemented but not yet tested (requires API key)
Ollama: experimental (expects /api/transcribe endpoint)

⚠️ Known Limitations

Platform Support: The app is currently only tested and built with macOS. Windows and Linux support is planned but not yet verified.
Provider Testing: Only Soniox has been properly tested. Other providers (OpenAI Whisper, Ollama) are implemented but have not been tested yet.
macOS Overlay: The floating overlay does not work correctly in full-screen mode on macOS. This is a known limitation due to macOS window management restrictions.
Credentials: API keys are stored in plain text JSON format without encryption. This is intended for development purposes only.

📜 License

This project is licensed under the MIT License.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenVoice Flow

✨ Features

📸 Screenshots

Main Application

Settings

System Tray & Overlay

🔧 Architecture Overview

🧪 Usage Example

📁 Project Folder Structure

🚀 Development

Prerequisites

Run (dev)

Build (unsigned)

🔐 Credentials & Storage

🧰 Debug Mode & Cache Cleanup

⌨️ Hotkeys

Recording Modes

Hotkey Requirements

Capturing Hotkeys

✅ Permissions Notes

📡 Provider Status

⚠️ Known Limitations

📜 License

MIT License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs/screenshots		docs/screenshots
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

OpenVoice Flow

✨ Features

📸 Screenshots

Main Application

Settings

System Tray & Overlay

🔧 Architecture Overview

🧪 Usage Example

📁 Project Folder Structure

🚀 Development

Prerequisites

Run (dev)

Build (unsigned)

🔐 Credentials & Storage

🧰 Debug Mode & Cache Cleanup

⌨️ Hotkeys

Recording Modes

Hotkey Requirements

Capturing Hotkeys

✅ Permissions Notes

📡 Provider Status

⚠️ Known Limitations

📜 License

MIT License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages