| src/voice_mode_preview | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
voice-mode-preview
Offline auditioner for voice-enhancer
pitch-shift presets. Apply the four planned VoiceMode presets — man / woman / child / old —
to any WAV file (or a synthesized test tone) so you can A/B them by ear before wiring the
values into the C++ AudioEngine.
Pure NumPy + stdlib
wave. No PortAudio, no JUCE, no real-time. This is the "is the preset value any good?" tool, not the runtime engine.
Why this project
The voice-enhancer C++ engine shipped pitch-shift v1.0 on Apr 26 (WSOLA, 2048-sample Hann,
4× overlap, ±128 NCC search). The next planned milestone is wiring four VoiceMode presets:
| Preset | Pitch | Target F0 |
|---|---|---|
| 👨 Man | −3 st | ~107 Hz |
| 👩 Woman | +4 st | ~220 Hz |
| 👦 Child | +8 st | ~320 Hz |
| 👴 Old | −2 st | ~115 Hz |
Those values were designed but never auditioned — they're guesses until you hear them on a real voice clip. Editing the C++ engine, rebuilding, and rerouting BlackHole every time you want to nudge a preset by ±1 semitone is the wrong feedback loop. This tool closes that loop: drop in a 5-second voice WAV, get four output files, listen, adjust the table, commit.
Once a preset value sounds right, copy the final number into Preset.h / PitchShift.cpp
in voice-enhancer with confidence.
(Triggered by the May 1 / 3 morning-brief threads where VoiceMode was flagged as the next ~60–90 min unblock for the voice-enhancer roadmap.)
Install
git clone https://git.lucataco.dev/Catacolabs/voice-mode-preview.git
cd voice-mode-preview
pip install -e .
Only dependency is numpy>=1.24.
Run
# 1) See the preset table
python -m voice_mode_preview --list
# 2) Synthesize a 1.5 s vocal-ish test tone and render all 5 presets
python -m voice_mode_preview demo --out-dir out/
# 3) Render all presets on your own clip
python -m voice_mode_preview render path/to/voice.wav --out-dir out/
# 4) Render just one preset
python -m voice_mode_preview render path/to/voice.wav --preset child --out-dir out/
Output files are named {stem}__{preset}.wav.
Tweaking presets
Edit src/voice_mode_preview/presets.py — the PRESETS dict — and re-run. Keep this table
as the source of truth; once you're happy, port the numbers into voice-enhancer's
Preset.h and bump it to v1.1.0.
Algorithm note
The pitch-shift here is a NumPy WSOLA implementation that mirrors the C++ engine's parameters (window, hop, search range). Output won't be byte-identical to the AudioEngine (different overlap-add bookkeeping, no SIMD), but it's audibly close enough to make preset decisions on. If a preset sounds wrong here, it'll sound wrong in the engine too.
License
MIT.