CLI for interacting with DGX Spark llama.cpp server
Find a file
clawdbot a49bbd07a6 🔥 Initial commit: spark CLI for DGX Spark llama.cpp
A CLI tool for interacting with the llama.cpp server on DGX Spark:
- spark status: Check server health and loaded model
- spark ask: Single question/answer
- spark chat: Interactive chat mode
- spark tweet: Generate tweet ideas
- spark summarize: Summarize files or piped text
- spark code: Generate code from descriptions

Built during overnight session by Navi 
2026-02-05 08:02:21 +00:00
.gitignore 🔥 Initial commit: spark CLI for DGX Spark llama.cpp 2026-02-05 08:02:21 +00:00
install.sh 🔥 Initial commit: spark CLI for DGX Spark llama.cpp 2026-02-05 08:02:21 +00:00
LICENSE 🔥 Initial commit: spark CLI for DGX Spark llama.cpp 2026-02-05 08:02:21 +00:00
README.md 🔥 Initial commit: spark CLI for DGX Spark llama.cpp 2026-02-05 08:02:21 +00:00
spark 🔥 Initial commit: spark CLI for DGX Spark llama.cpp 2026-02-05 08:02:21 +00:00

🔥 spark

A CLI for interacting with the DGX Spark llama.cpp server.

Quick access to your local LLM from the terminal — ask questions, chat, generate tweets, summarize files, and more.

Installation

# Clone and install
git clone https://git.cataco.net/Catacolabs/spark-cli.git
cd spark-cli
./install.sh

# Or just copy the script
cp spark ~/.local/bin/
chmod +x ~/.local/bin/spark

Usage

# Check if Spark server is running
spark status

# Ask a question
spark ask "What are the key features of Rust?"

# Interactive chat
spark chat

# Generate tweet ideas
spark tweet "Workers AI announcements"

# Summarize a file
spark summarize paper.pdf
cat notes.txt | spark summarize

# Generate code
spark code "async HTTP server in Python with FastAPI"

# Text completion (non-chat)
spark complete "The future of AI is"

Commands

Command Description
status Show server health and loaded model
ask <question> Ask a single question, get a response
chat Interactive chat mode with history
tweet <topic> Generate 3 tweet ideas about a topic
summarize <file> Summarize text from file or stdin
code <description> Generate code from a description
complete <text> Raw text completion (non-chat)
raw <endpoint> Make raw API calls (debugging)

Environment Variables

Variable Default Description
SPARK_URL http://spark:8080 llama.cpp server URL
SPARK_MODEL (auto-detected) Model to use for completions

Requirements

  • curl - HTTP client
  • jq - JSON processor
  • A running llama.cpp server on your DGX Spark

Examples

Quick Research

# Get a quick explanation
spark ask "Explain transformer attention mechanisms in simple terms"

# Summarize a paper
curl -s "https://arxiv.org/abs/2301.00000" | spark summarize

Content Creation

# Tweet ideas for your content pipeline
spark tweet "new ComfyUI workflow for video generation"

# Expand on an idea
spark ask "Write a thread about why local LLMs matter for privacy"

Coding Help

# Quick code generation
spark code "Python function to parse RSS feeds and extract titles"

# Debug help
echo "TypeError: 'NoneType' object is not subscriptable" | spark ask "What causes this error?"

Interactive Session

$ spark chat
🔥 Spark Chat (model: Qwen2.5-72B-Instruct)
Type 'exit' or Ctrl+C to quit, '/clear' to reset

You> What's the difference between async and threading in Python?
Spark> [response...]

You> Give me an example of when to use each
Spark> [response...]

You> /clear
Chat history cleared

You> exit
Goodbye!

How It Works

The CLI communicates with llama.cpp's OpenAI-compatible API:

  • /v1/chat/completions for chat-based commands
  • /v1/completions for raw text completion
  • /v1/models to detect the loaded model
  • /health for status checks

Tips

  1. Server offline? Make sure your DGX Spark is powered on and llama.cpp is running
  2. Slow responses? Large models take time — the CLI has a 120s timeout
  3. Want streaming? Not yet implemented, but PRs welcome!
  4. Custom model? Set SPARK_MODEL=your-model-name

Built during an overnight session by Navi