Offline auditioner for voice-enhancer pitch-shift presets (man/woman/child/old).

Find a file

hermes 1f63201d36 Initial MVP commit		2026-05-03 03:03:54 -04:00
src/voice_mode_preview	Initial MVP commit	2026-05-03 03:03:54 -04:00
.gitignore	Initial MVP commit	2026-05-03 03:03:54 -04:00
LICENSE	Initial MVP commit	2026-05-03 03:03:54 -04:00
pyproject.toml	Initial MVP commit	2026-05-03 03:03:54 -04:00
README.md	Initial MVP commit	2026-05-03 03:03:54 -04:00

README.md

voice-mode-preview

Offline auditioner for voice-enhancer pitch-shift presets. Apply the four planned VoiceMode presets — man / woman / child / old — to any WAV file (or a synthesized test tone) so you can A/B them by ear before wiring the values into the C++ AudioEngine.

Pure NumPy + stdlib wave. No PortAudio, no JUCE, no real-time. This is the "is the preset value any good?" tool, not the runtime engine.

Why this project

The voice-enhancer C++ engine shipped pitch-shift v1.0 on Apr 26 (WSOLA, 2048-sample Hann, 4× overlap, ±128 NCC search). The next planned milestone is wiring four VoiceMode presets:

Preset	Pitch	Target F0
👨 Man	−3 st	~107 Hz
👩 Woman	+4 st	~220 Hz
👦 Child	+8 st	~320 Hz
👴 Old	−2 st	~115 Hz

Those values were designed but never auditioned — they're guesses until you hear them on a real voice clip. Editing the C++ engine, rebuilding, and rerouting BlackHole every time you want to nudge a preset by ±1 semitone is the wrong feedback loop. This tool closes that loop: drop in a 5-second voice WAV, get four output files, listen, adjust the table, commit.

Once a preset value sounds right, copy the final number into Preset.h / PitchShift.cpp in voice-enhancer with confidence.

(Triggered by the May 1 / 3 morning-brief threads where VoiceMode was flagged as the next ~60–90 min unblock for the voice-enhancer roadmap.)

Install

git clone https://git.lucataco.dev/Catacolabs/voice-mode-preview.git
cd voice-mode-preview
pip install -e .

Only dependency is numpy>=1.24.

Run

# 1) See the preset table
python -m voice_mode_preview --list

# 2) Synthesize a 1.5 s vocal-ish test tone and render all 5 presets
python -m voice_mode_preview demo --out-dir out/

# 3) Render all presets on your own clip
python -m voice_mode_preview render path/to/voice.wav --out-dir out/

# 4) Render just one preset
python -m voice_mode_preview render path/to/voice.wav --preset child --out-dir out/

Output files are named {stem}__{preset}.wav.

Tweaking presets

Edit src/voice_mode_preview/presets.py — the PRESETS dict — and re-run. Keep this table as the source of truth; once you're happy, port the numbers into voice-enhancer's Preset.h and bump it to v1.1.0.

Algorithm note

The pitch-shift here is a NumPy WSOLA implementation that mirrors the C++ engine's parameters (window, hop, search range). Output won't be byte-identical to the AudioEngine (different overlap-add bookkeeping, no SIMD), but it's audibly close enough to make preset decisions on. If a preset sounds wrong here, it'll sound wrong in the engine too.

License

MIT.

README.md Unescape Escape