| src/pdf2narration | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
| setup.py | ||
pdf2narration
Turn any PDF into a narrated MP3 in one command — chunk → Replicate Inworld TTS → ffmpeg concat. Built for skim-by-ear research workflows: papers on a walk, reports on the bus.
pdf2narration paper.pdf --pages 1-3 --voice Ashley
# → paper.mp3
Why this project
The May 14 chat thread tried to turn /Users/lucataco/Documents/PDFs/1.pdf into a narrated Manim video via the manim-video skill, but stalled at PDF discovery (iCloud not synced). That request surfaced a smaller, more reusable primitive: PDF → narrated audio. No animations, no Manim, no rendering pipeline — just clean text + Replicate Inworld TTS + ffmpeg.
This is the audio-only narrator that the Manim pipeline already wraps internally — extracted, generalized, and shipped as a standalone CLI so the next time the user wants to "listen to this PDF" it's one command, not a video render.
Ties into the broader research/tooling-leverage thread: the same Inworld Realtime TTS 1.5 Max pipeline (pinned version, ~/.env token lookup, MP3 output, on-disk cache) the user already uses for Manim explainer videos (BirdCLEF, Nemotron) — now reusable for arxiv/papers/specs/reports.
Features
- PDF → cleaned text:
pypdfextraction with dehyphenation, paragraph preservation, page-number stripping, common acronym expansion (LLM/RLHF/GPU/etc.) for clearer TTS. - Smart chunking: paragraph-first, sentence-fallback splitting around an 800-char budget so each TTS call stays in the model's sweet spot.
- Replicate Inworld TTS: pinned to the same
inworld/realtime-tts-1.5-maxversion used by the manim-video skill. - On-disk cache: identical chunks reuse cached MP3s — rerun cheaply.
- ffmpeg concat: clean stream-copy join of per-chunk MP3s, no re-encode.
- CLI flags:
--pages,--voice,--rate,--temperature,--dry-run,--text-only.
Install
pip install -e .
Requires Python 3.9+ and ffmpeg on PATH.
Setup
Export your Replicate token, or put it in ~/.env:
export REPLICATE_API_TOKEN=r8_...
# or:
echo 'REPLICATE_API_TOKEN=r8_...' >> ~/.env
Usage
# Narrate the first 3 pages with the default Ashley voice
pdf2narration paper.pdf --pages 1-3
# Different voice + custom output path
pdf2narration paper.pdf --voice Dennis -o ~/Audiobooks/paper.mp3
# Preview chunks without spending TTS credits
pdf2narration paper.pdf --pages 1-3 --dry-run
# Just dump the cleaned text (pipe-friendly)
pdf2narration paper.pdf --text-only > paper.txt
Suggested voices
Ashley (default), Dennis (good for tech narration), Alex. See Replicate model page for the full list.
Project layout
src/pdf2narration/
cli.py argparse entrypoint
extract.py PDF text + cleanup + chunking
tts.py Replicate Inworld client + cache
audio.py ffmpeg concat helper
tests/
test_extract.py
License
MIT