| src/voice_meter | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
voice-meter
Objective per-clip audio metrics for voice samples — RMS, peak, crest factor,
clipping, silence ratio, true-peak headroom, dynamic range, and a single 0–100
"voice quality" score. Pure-stdlib Python wrapper around the ffmpeg
astats / silencedetect / volumedetect filters.
Designed as the objective companion to preset-audition:
once you've rendered the 5 voice-mode presets (Dry / Man / Woman / Child / Old)
against your mic sample, run voice-meter compare audition/ to get a
side-by-side numeric table instead of relying on ear alone. Eliminates the
"sounds fine but is it actually clipping?" guesswork that's been gating the
voice-enhancer Zoom/Teams demo.
Why this project
The voice-DSP consulting flagship — voice-enhancer (WSOLA pitch-shift) →
voice-mode-preview (4 presets) → blackhole-doctor (routing diagnostics) →
preset-audition (A/B HTML renderer) — has been gated on the user actually
auditioning the presets and picking a winner. Subjective audition is slow
and error-prone (clipping in a normalized A/B player is inaudible until it's
on a call). Objective metrics close that loop: render presets, run
voice-meter compare, pick the highest-score preset that doesn't clip, wire
its semitone value into ve_engine_set_pitch_semitones, ship the demo.
Also stands alone as a consulting-pitch deliverable — every voice-AI project needs reproducible audio QA, and this is a 200-LOC drop-in.
Install
pip install -e .
Requires ffmpeg on PATH (Homebrew: brew install ffmpeg).
Usage
Single file:
voice-meter measure path/to/clip.wav
Batch / compare a directory of renders (e.g. preset-audition output):
voice-meter compare audition/
JSON for scripting:
voice-meter measure clip.wav --json
voice-meter compare audition/ --json
Metrics
| Metric | Description |
|---|---|
peak_db |
Max sample amplitude in dBFS |
rms_db |
Mean RMS level in dBFS |
crest_db |
Crest factor (peak − RMS); voice target ≈ 12–18 dB |
clip_pct |
Percent of samples within 0.5 dB of full scale |
silence_pct |
Percent of duration below −40 dBFS |
headroom_db |
Distance from peak to 0 dBFS |
dynamic_range_db |
P90 − P10 of windowed RMS |
duration_s |
Clip duration |
score |
0–100 composite; penalizes clipping, low headroom, excessive silence, poor crest |
License
MIT