Objective per-clip audio metrics for voice samples (RMS, peak, clipping, crest, score) — companion to preset-audition

Find a file

hermes abc7fa0f29 Initial MVP commit		2026-05-13 03:03:51 -04:00
src/voice_meter	Initial MVP commit	2026-05-13 03:03:51 -04:00
tests	Initial MVP commit	2026-05-13 03:03:51 -04:00
.gitignore	Initial MVP commit	2026-05-13 03:03:51 -04:00
LICENSE	Initial MVP commit	2026-05-13 03:03:51 -04:00
pyproject.toml	Initial MVP commit	2026-05-13 03:03:51 -04:00
README.md	Initial MVP commit	2026-05-13 03:03:51 -04:00

README.md

voice-meter

Objective per-clip audio metrics for voice samples — RMS, peak, crest factor, clipping, silence ratio, true-peak headroom, dynamic range, and a single 0–100 "voice quality" score. Pure-stdlib Python wrapper around the ffmpeg astats / silencedetect / volumedetect filters.

Designed as the objective companion to preset-audition: once you've rendered the 5 voice-mode presets (Dry / Man / Woman / Child / Old) against your mic sample, run voice-meter compare audition/ to get a side-by-side numeric table instead of relying on ear alone. Eliminates the "sounds fine but is it actually clipping?" guesswork that's been gating the voice-enhancer Zoom/Teams demo.

Why this project

The voice-DSP consulting flagship — voice-enhancer (WSOLA pitch-shift) → voice-mode-preview (4 presets) → blackhole-doctor (routing diagnostics) → preset-audition (A/B HTML renderer) — has been gated on the user actually auditioning the presets and picking a winner. Subjective audition is slow and error-prone (clipping in a normalized A/B player is inaudible until it's on a call). Objective metrics close that loop: render presets, run voice-meter compare, pick the highest-score preset that doesn't clip, wire its semitone value into ve_engine_set_pitch_semitones, ship the demo.

Also stands alone as a consulting-pitch deliverable — every voice-AI project needs reproducible audio QA, and this is a 200-LOC drop-in.

Install

pip install -e .

Requires ffmpeg on PATH (Homebrew: brew install ffmpeg).

Usage

Single file:

voice-meter measure path/to/clip.wav

Batch / compare a directory of renders (e.g. preset-audition output):

voice-meter compare audition/

JSON for scripting:

voice-meter measure clip.wav --json
voice-meter compare audition/ --json

Metrics

Metric	Description
`peak_db`	Max sample amplitude in dBFS
`rms_db`	Mean RMS level in dBFS
`crest_db`	Crest factor (peak − RMS); voice target ≈ 12–18 dB
`clip_pct`	Percent of samples within 0.5 dB of full scale
`silence_pct`	Percent of duration below −40 dBFS
`headroom_db`	Distance from peak to 0 dBFS
`dynamic_range_db`	P90 − P10 of windowed RMS
`duration_s`	Clip duration
`score`	0–100 composite; penalizes clipping, low headroom, excessive silence, poor crest

License

MIT

README.md Unescape Escape