Objective per-clip audio metrics for voice samples (RMS, peak, clipping, crest, score) — companion to preset-audition
Find a file
2026-05-13 03:03:51 -04:00
src/voice_meter Initial MVP commit 2026-05-13 03:03:51 -04:00
tests Initial MVP commit 2026-05-13 03:03:51 -04:00
.gitignore Initial MVP commit 2026-05-13 03:03:51 -04:00
LICENSE Initial MVP commit 2026-05-13 03:03:51 -04:00
pyproject.toml Initial MVP commit 2026-05-13 03:03:51 -04:00
README.md Initial MVP commit 2026-05-13 03:03:51 -04:00

voice-meter

Objective per-clip audio metrics for voice samples — RMS, peak, crest factor, clipping, silence ratio, true-peak headroom, dynamic range, and a single 0100 "voice quality" score. Pure-stdlib Python wrapper around the ffmpeg astats / silencedetect / volumedetect filters.

Designed as the objective companion to preset-audition: once you've rendered the 5 voice-mode presets (Dry / Man / Woman / Child / Old) against your mic sample, run voice-meter compare audition/ to get a side-by-side numeric table instead of relying on ear alone. Eliminates the "sounds fine but is it actually clipping?" guesswork that's been gating the voice-enhancer Zoom/Teams demo.

Why this project

The voice-DSP consulting flagship — voice-enhancer (WSOLA pitch-shift) → voice-mode-preview (4 presets) → blackhole-doctor (routing diagnostics) → preset-audition (A/B HTML renderer) — has been gated on the user actually auditioning the presets and picking a winner. Subjective audition is slow and error-prone (clipping in a normalized A/B player is inaudible until it's on a call). Objective metrics close that loop: render presets, run voice-meter compare, pick the highest-score preset that doesn't clip, wire its semitone value into ve_engine_set_pitch_semitones, ship the demo.

Also stands alone as a consulting-pitch deliverable — every voice-AI project needs reproducible audio QA, and this is a 200-LOC drop-in.

Install

pip install -e .

Requires ffmpeg on PATH (Homebrew: brew install ffmpeg).

Usage

Single file:

voice-meter measure path/to/clip.wav

Batch / compare a directory of renders (e.g. preset-audition output):

voice-meter compare audition/

JSON for scripting:

voice-meter measure clip.wav --json
voice-meter compare audition/ --json

Metrics

Metric Description
peak_db Max sample amplitude in dBFS
rms_db Mean RMS level in dBFS
crest_db Crest factor (peak RMS); voice target ≈ 1218 dB
clip_pct Percent of samples within 0.5 dB of full scale
silence_pct Percent of duration below 40 dBFS
headroom_db Distance from peak to 0 dBFS
dynamic_range_db P90 P10 of windowed RMS
duration_s Clip duration
score 0100 composite; penalizes clipping, low headroom, excessive silence, poor crest

License

MIT