Skip to content

bobalazek/openvoice-flow

Repository files navigation

OpenVoice Flow

⚠️ Development Phase: This project is currently in active development and is not yet ready for production use. Features may be incomplete, unstable, or subject to change.

OpenVoice Flow is a cross-platform, open-source, real-time voice transcription app designed to deliver instant, context-aware speech-to-text functionality. Inspired by Whisper Flow, OpenVoice Flow aims to be developer-friendly, privacy-first, and extensible—with support for local and cloud-based providers under a bring-your-own-key model.

✨ Features

  • 🔊 Real-time voice transcription with low latency
  • 🧠 Multi-provider support (Soniox, Whisper/Ollama, future providers)
  • 💻 Runs anywhere: Windows, macOS, Linux (Tauri powered)
  • 🧩 Modular provider architecture, BYO-API-key
  • 🔐 Privacy-first: data stays on device, local key storage (unencrypted in MVP)
  • 📝 Clipboard or text field insertion
  • 🔁 History view with audio and re-transcription
  • ⚙️ Personal dictionary, instructions, formatting rules
  • 📊 Network diagnostics per provider (latency, stats)
  • 🔥 Floating overlay for live feedback

📸 Screenshots

Main Application

Dashboard History Stats
App Dashboard App History App Stats

Settings

General Audio Output Key Bindings
Settings 01 Settings 02 Settings 03 Settings 04

System Tray & Overlay

macOS Tray Overlay Ready Overlay Listening Overlay Captured
macOS Tray Overlay Ready Overlay Listening Overlay Captured

🔧 Architecture Overview

Built with:

  • Tauri for native cross-platform shell (Rust backend, Webview UI)
  • React + TypeScript frontend
  • Rust modules for audio capture, streaming, and provider interfacing

Component Layers:

  • Audio Engine: mic access, noise filtering, VAD, audio chunking
  • Provider Abstraction: clean API interface to any STT backend
  • Transcription Overlay: animated, floating UI with real-time updates
  • Storage Layer: SQLite + file storage for transcript and optional audio
  • Settings & Stats: global hotkeys, language, clipboard, API keys, diagnostics

🧪 Usage Example

  1. Launch app (tray menu by default)
  2. Press your configured global hotkey (e.g., Ctrl+Shift+V)
  3. Speak—watch the floating overlay update in real time
  4. Stop—text is inserted into your cursor location or copied to clipboard
  5. Access saved transcriptions from History panel

📁 Project Folder Structure

openvoiceflow/
├── src-tauri/                    # Tauri (Rust) backend
│   ├── src/
│   │   ├── main.rs               # Entry point
│   │   ├── commands.rs           # Frontend ↔ backend bridge (Tauri)
│   │   ├── state.rs              # Application state management
│   │   ├── settings.rs           # Settings handling
│   │   ├── credentials.rs        # API key storage
│   │   ├── hotkeys.rs            # Global hotkey handling
│   │   ├── tray.rs               # System tray menu
│   │   ├── types.rs              # Shared types
│   │   ├── constants.rs          # Constants
│   │   ├── audio/                # Audio capture and processing
│   │   │   ├── engine.rs         # Audio engine
│   │   │   ├── worker.rs         # Audio worker
│   │   │   └── mod.rs
│   │   ├── providers/            # STT provider implementations
│   │   │   ├── mod.rs
│   │   │   ├── openai.rs         # OpenAI Whisper
│   │   │   ├── ollama.rs         # Ollama
│   │   │   ├── soniox.rs         # Soniox
│   │   └── storage/              # Local data storage
│   │       ├── mod.rs
│   │       ├── sqlite.rs         # SQLite database
│   │       └── cleanup.rs        # Cache cleanup
│   ├── Cargo.toml                # Rust dependencies
│   ├── tauri.conf.json           # Tauri configuration
│   └── build.rs                  # Build script
├── src/                         # React frontend
│   ├── main.tsx                  # Entry point
│   ├── App.tsx                   # Main app component
│   ├── api.ts                    # API client
│   ├── components/               # React components
│   │   ├── Controls.tsx          # Recording controls
│   │   ├── History.tsx           # History component
│   │   ├── HistoryPanel.tsx      # History panel
│   │   ├── Overlay.tsx           # Overlay component
│   │   ├── OverlayWindow.tsx     # Overlay window
│   │   ├── SettingsPanel.tsx    # Settings panel
│   │   ├── StatsPanel.tsx        # Stats panel
│   │   └── Toggle.tsx            # Toggle component
│   └── styles/
│       └── app.css               # Global styles
├── index.html                   # HTML template
├── package.json                 # Node.js dependencies
├── tsconfig.json                # TypeScript configuration
├── vite.config.ts               # Vite configuration
├── README.md                    # Overview and developer intro
└── .gitignore                   # Git ignore rules

🚀 Development

Prerequisites

  • Node.js 18+
  • Rust toolchain (stable)
  • Tauri prerequisites (platform-specific)

Run (dev)

npm install
npm run tauri dev

Build (unsigned)

npm run tauri build

🔐 Credentials & Storage

API keys are stored locally in the app config directory in plain JSON (unencrypted).

Default locations:

  • macOS: ~/Library/Application Support/com.openvoice.flow/credentials.json
  • Windows: %APPDATA%\\com.openvoice.flow\\credentials.json
  • Linux: ~/.config/com.openvoice.flow/credentials.json

SQLite history lives in the app data directory as openvoice-flow.sqlite. Audio files (if enabled) are stored under audio/.

🧰 Debug Mode & Cache Cleanup

Raw request/response payloads (when available) are persisted alongside each transcript. Debug mode controls visibility in the History view.

Audio cache cleanup runs on a configurable interval (default 15 minutes) and removes old or oversized audio files based on:

  • Max cache size (MB)
  • Max cache age (hours)

An append-only debug log is written to app_data/debug/transcripts.jsonl for every transcription.

⌨️ Hotkeys

OpenVoice Flow supports two recording modes controlled by global hotkeys:

Recording Modes

Mode Description
Toggle Press hotkey to start recording, press again to stop
Push-to-Talk (PTT) Hold hotkey to record, release to stop
Both Both hotkeys are active simultaneously

Hotkey Requirements

Toggle Shortcut (for Toggle mode):

  • Must use a modifier key: Cmd/Ctrl + key, Alt + key, or Shift + key
  • Or use a function key alone: F1 through F12
  • Examples: CmdOrCtrl+Shift+V (default), Alt+R, F8

Push-to-Talk Key (for PTT mode):

  • Function keys: F1 through F12
  • Special keys: Space, Enter, Tab, Esc
  • Examples: F9 (default), Space

Note: Regular alphanumeric keys (like A, 7) cannot be used alone as they would interfere with normal typing. They must be combined with a modifier key.

Capturing Hotkeys

  1. Go to Settings > Key Bindings
  2. Click the Capture button next to the hotkey you want to change
  3. Press your desired key combination
  4. The hotkey will be saved automatically

✅ Permissions Notes

  • Microphone access is required for recording.
  • macOS may prompt for Input Monitoring (global hotkeys) and Accessibility (auto-paste).
  • Windows/Linux auto-paste may require additional system permissions depending on security settings.

📡 Provider Status

  • Soniox: implemented and properly tested (requires API key)
  • OpenAI Whisper: implemented but not yet tested (requires API key)
  • Ollama: experimental (expects /api/transcribe endpoint)

⚠️ Known Limitations

  • Platform Support: The app is currently only tested and built with macOS. Windows and Linux support is planned but not yet verified.
  • Provider Testing: Only Soniox has been properly tested. Other providers (OpenAI Whisper, Ollama) are implemented but have not been tested yet.
  • macOS Overlay: The floating overlay does not work correctly in full-screen mode on macOS. This is a known limitation due to macOS window management restrictions.
  • Credentials: API keys are stored in plain text JSON format without encryption. This is intended for development purposes only.

📜 License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2025 OpenVoice Flow Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

OpenVoice Flow is a cross-platform, open-source, real-time voice transcription app designed to deliver instant, context-aware speech-to-text functionality. Inspired by Whisper Flow, OpenVoice Flow aims to be developer-friendly, privacy-first, and extensible—with support for local and cloud-based providers under a bring-your-own-key model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors