CLI for interacting with DGX Spark llama.cpp server

Find a file

clawdbot a49bbd07a6 🔥 Initial commit: spark CLI for DGX Spark llama.cpp A CLI tool for interacting with the llama.cpp server on DGX Spark: - spark status: Check server health and loaded model - spark ask: Single question/answer - spark chat: Interactive chat mode - spark tweet: Generate tweet ideas - spark summarize: Summarize files or piped text - spark code: Generate code from descriptions Built during overnight session by Navi ✨		2026-02-05 08:02:21 +00:00
.gitignore	🔥 Initial commit: spark CLI for DGX Spark llama.cpp	2026-02-05 08:02:21 +00:00
install.sh	🔥 Initial commit: spark CLI for DGX Spark llama.cpp	2026-02-05 08:02:21 +00:00
LICENSE	🔥 Initial commit: spark CLI for DGX Spark llama.cpp	2026-02-05 08:02:21 +00:00
README.md	🔥 Initial commit: spark CLI for DGX Spark llama.cpp	2026-02-05 08:02:21 +00:00
spark	🔥 Initial commit: spark CLI for DGX Spark llama.cpp	2026-02-05 08:02:21 +00:00

README.md

🔥 spark

A CLI for interacting with the DGX Spark llama.cpp server.

Quick access to your local LLM from the terminal — ask questions, chat, generate tweets, summarize files, and more.

Installation

# Clone and install
git clone https://git.cataco.net/Catacolabs/spark-cli.git
cd spark-cli
./install.sh

# Or just copy the script
cp spark ~/.local/bin/
chmod +x ~/.local/bin/spark

Usage

# Check if Spark server is running
spark status

# Ask a question
spark ask "What are the key features of Rust?"

# Interactive chat
spark chat

# Generate tweet ideas
spark tweet "Workers AI announcements"

# Summarize a file
spark summarize paper.pdf
cat notes.txt | spark summarize

# Generate code
spark code "async HTTP server in Python with FastAPI"

# Text completion (non-chat)
spark complete "The future of AI is"

Commands

Command	Description
`status`	Show server health and loaded model
`ask <question>`	Ask a single question, get a response
`chat`	Interactive chat mode with history
`tweet <topic>`	Generate 3 tweet ideas about a topic
`summarize <file>`	Summarize text from file or stdin
`code <description>`	Generate code from a description
`complete <text>`	Raw text completion (non-chat)
`raw <endpoint>`	Make raw API calls (debugging)

Environment Variables

Variable	Default	Description
`SPARK_URL`	`http://spark:8080`	llama.cpp server URL
`SPARK_MODEL`	(auto-detected)	Model to use for completions

Requirements

curl - HTTP client
jq - JSON processor
A running llama.cpp server on your DGX Spark

Examples

Quick Research

# Get a quick explanation
spark ask "Explain transformer attention mechanisms in simple terms"

# Summarize a paper
curl -s "https://arxiv.org/abs/2301.00000" | spark summarize

Content Creation

# Tweet ideas for your content pipeline
spark tweet "new ComfyUI workflow for video generation"

# Expand on an idea
spark ask "Write a thread about why local LLMs matter for privacy"

Coding Help

# Quick code generation
spark code "Python function to parse RSS feeds and extract titles"

# Debug help
echo "TypeError: 'NoneType' object is not subscriptable" | spark ask "What causes this error?"

Interactive Session

$ spark chat
🔥 Spark Chat (model: Qwen2.5-72B-Instruct)
Type 'exit' or Ctrl+C to quit, '/clear' to reset

You> What's the difference between async and threading in Python?
Spark> [response...]

You> Give me an example of when to use each
Spark> [response...]

You> /clear
Chat history cleared

You> exit
Goodbye!

How It Works

The CLI communicates with llama.cpp's OpenAI-compatible API:

/v1/chat/completions for chat-based commands
/v1/completions for raw text completion
/v1/models to detect the loaded model
/health for status checks

Tips

Server offline? Make sure your DGX Spark is powered on and llama.cpp is running
Slow responses? Large models take time — the CLI has a 120s timeout
Want streaming? Not yet implemented, but PRs welcome!
Custom model? Set SPARK_MODEL=your-model-name

Built during an overnight session by Navi ✨