CLI for interacting with DGX Spark llama.cpp server
A CLI tool for interacting with the llama.cpp server on DGX Spark:
- spark status: Check server health and loaded model
- spark ask: Single question/answer
- spark chat: Interactive chat mode
- spark tweet: Generate tweet ideas
- spark summarize: Summarize files or piped text
- spark code: Generate code from descriptions
Built during overnight session by Navi ✨
|
||
|---|---|---|
| .gitignore | ||
| install.sh | ||
| LICENSE | ||
| README.md | ||
| spark | ||
🔥 spark
A CLI for interacting with the DGX Spark llama.cpp server.
Quick access to your local LLM from the terminal — ask questions, chat, generate tweets, summarize files, and more.
Installation
# Clone and install
git clone https://git.cataco.net/Catacolabs/spark-cli.git
cd spark-cli
./install.sh
# Or just copy the script
cp spark ~/.local/bin/
chmod +x ~/.local/bin/spark
Usage
# Check if Spark server is running
spark status
# Ask a question
spark ask "What are the key features of Rust?"
# Interactive chat
spark chat
# Generate tweet ideas
spark tweet "Workers AI announcements"
# Summarize a file
spark summarize paper.pdf
cat notes.txt | spark summarize
# Generate code
spark code "async HTTP server in Python with FastAPI"
# Text completion (non-chat)
spark complete "The future of AI is"
Commands
| Command | Description |
|---|---|
status |
Show server health and loaded model |
ask <question> |
Ask a single question, get a response |
chat |
Interactive chat mode with history |
tweet <topic> |
Generate 3 tweet ideas about a topic |
summarize <file> |
Summarize text from file or stdin |
code <description> |
Generate code from a description |
complete <text> |
Raw text completion (non-chat) |
raw <endpoint> |
Make raw API calls (debugging) |
Environment Variables
| Variable | Default | Description |
|---|---|---|
SPARK_URL |
http://spark:8080 |
llama.cpp server URL |
SPARK_MODEL |
(auto-detected) | Model to use for completions |
Requirements
curl- HTTP clientjq- JSON processor- A running llama.cpp server on your DGX Spark
Examples
Quick Research
# Get a quick explanation
spark ask "Explain transformer attention mechanisms in simple terms"
# Summarize a paper
curl -s "https://arxiv.org/abs/2301.00000" | spark summarize
Content Creation
# Tweet ideas for your content pipeline
spark tweet "new ComfyUI workflow for video generation"
# Expand on an idea
spark ask "Write a thread about why local LLMs matter for privacy"
Coding Help
# Quick code generation
spark code "Python function to parse RSS feeds and extract titles"
# Debug help
echo "TypeError: 'NoneType' object is not subscriptable" | spark ask "What causes this error?"
Interactive Session
$ spark chat
🔥 Spark Chat (model: Qwen2.5-72B-Instruct)
Type 'exit' or Ctrl+C to quit, '/clear' to reset
You> What's the difference between async and threading in Python?
Spark> [response...]
You> Give me an example of when to use each
Spark> [response...]
You> /clear
Chat history cleared
You> exit
Goodbye!
How It Works
The CLI communicates with llama.cpp's OpenAI-compatible API:
/v1/chat/completionsfor chat-based commands/v1/completionsfor raw text completion/v1/modelsto detect the loaded model/healthfor status checks
Tips
- Server offline? Make sure your DGX Spark is powered on and llama.cpp is running
- Slow responses? Large models take time — the CLI has a 120s timeout
- Want streaming? Not yet implemented, but PRs welcome!
- Custom model? Set
SPARK_MODEL=your-model-name
Built during an overnight session by Navi ✨