Documentation
URAI is a command-line and API toolkit for managing and serving local quantized models (GGUF). It lets you download, run, and serve models locally.
Easy Model Management
Download and manage GGUF models from HuggingFace with simple commands
urai pull mistral
Interactive Chat
Token-by-token streaming responses in your terminal effortlessly
urai run mistral
OpenAI-Compatible API
Drop-in replacement for OpenAI's API with local models
urai serve
GPU Acceleration
Leverage CUDA or Metal for faster inference
urai serve --gpu-layers 35
Browse Models
Discover thousands of models from TheBloke
urai list-all
Request Logging
Track all API requests with detailed logs
tail -f ~/.urai/api_requests.log
AI Agents
Create RAG-powered agents with custom knowledge bases
urai agent create myagent
Cloud Models
Use GPT-4, Claude, and other cloud models alongside local models
urai run openai:gpt-4
Installation
Download the installer for your platform from the home page and follow the installation instructions.
Basic Commands
usage: urai.py [-h] {pull,list,list-all,add,rm,run,serve,agent} ...
| Command | Description |
|---|---|
pull | Download a model |
list | List downloaded and recommended models |
list-all | Browse all GGUF models from TheBloke |
add | Add a custom model |
rm | Remove a model |
run | Start interactive chat |
serve | Start local API server |
agent | Manage AI agents with custom knowledge |
AI Agents (RAG)
Create custom AI agents with Retrieval-Augmented Generation (RAG). Each agent has its own knowledge base stored in a vector database, allowing for context-aware responses based on your documents.
๐ Key Features
- Separate vector DB per agent - Isolated knowledge bases
- No separate serve command - Agents auto-load with
urai serve - Interactive testing - Test agents before deploying
- Streaming responses - Token-by-token in CLI
- Document management - Add/list/delete documents
- API endpoints - Query any agent via REST API
- Persistent storage - All data saved locally
- Chunking & embeddings - Smart text processing
1. Create a New Agent
urai agent create myagent --description "Technical documentation assistant"
2. Add Knowledge to Agents
Add files or text directly to your agent's knowledge base:
# Add files
urai agent add myagent --file documentation.txt
urai agent add myagent --file manual.pdf
# Add text directly
urai agent add myagent --text "Important information here"
3. Test Agents Interactively
Chat with your agent locally before deploying:
# Chat with your agent (uses RAG)
urai agent run myagent
# With specific model
urai agent run myagent --model tinyllama
4. Manage Agents
# List all agents
urai agent list
# List documents in an agent
urai agent docs myagent
# Delete a document
urai agent delete-doc myagent abc12345
# Delete entire agent
urai agent delete myagent
5. API Access (Automatic!)
Agents are automatically loaded when you start the server:
# Start server (agents are automatically loaded)
urai serve
Query your agent via API:
curl http://localhost:11434/v1/agents/myagent/query \
-H "Content-Type: application/json" \
-d '{"question": "What does the documentation say about X?"}'
# List all agents
curl http://localhost:11434/agents
๐๏ธ Technical Architecture
- Vector Database: ChromaDB (persistent storage)
- Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
- Storage Structure:
~/.urai/agents/
โโโ myagent/
โ โโโ vectordb/ # ChromaDB storage
โ โโโ metadata.json # Agent info
โ โโโ documents.json # Document registry
โโโ anotheragent/
โโโ ...
๐ Complete Workflow Example
# 1. Create agent
urai agent create support --description "Customer support KB"
# 2. Add knowledge
urai agent add support --file faq.txt
urai agent add support --file policies.txt
# 3. Test locally
urai agent run support
# You: What is the refund policy?
# support: Based on the documentation... [RAG answer]
# 4. Deploy via API
urai serve
# All agents automatically available at /v1/agents/{name}/query
โ๏ธ Cloud Models
URAI supports cloud-based language models from major providers alongside local models. Use powerful cloud models like GPT-4, Claude, or Perplexity when you need cutting-edge performance.
๐ Supported Providers
- OpenAI - GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Anthropic - Claude 3 Opus, Sonnet, Haiku
- Perplexity - Models with real-time web search
- Mistral - Mistral Large, Medium, Small
- Groq - Ultra-fast inference with Llama models
๐ Quick Start
1. View Available Providers
urai cloud providers
Output:
โ๏ธ Available Cloud Providers:
openai - OpenAI
Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo...
Env: OPENAI_API_KEY
anthropic - Anthropic (Claude)
Models: claude-3-opus-20240229, claude-3-sonnet...
Env: ANTHROPIC_API_KEY
perplexity - Perplexity AI
Models: llama-3.1-sonar-large-128k-online...
Env: PERPLEXITY_API_KEY
mistral - Mistral AI
Models: mistral-large-latest...
Env: MISTRAL_API_KEY
groq - Groq
Models: llama-3.1-70b-versatile...
2. Set API Keys
Option A: Via Command
# OpenAI
urai cloud set-key openai sk-proj-xxxxx
# Claude
urai cloud set-key anthropic sk-ant-xxxxx
# Perplexity
urai cloud set-key perplexity pplx-xxxxx
# Mistral
urai cloud set-key mistral xxxxx
# Groq
urai cloud set-key groq gsk_xxxxx
Option B: Via Environment Variables
# Windows
set OPENAI_API_KEY=sk-proj-xxxxx
set ANTHROPIC_API_KEY=sk-ant-xxxxx
# Linux/Mac
export OPENAI_API_KEY=sk-proj-xxxxx
export ANTHROPIC_API_KEY=sk-ant-xxxxx
3. Check Configuration
urai cloud list-keys
Output:
๐ API Key Configuration:
openai - OpenAI โ Configured
anthropic - Anthropic (Claude) โ Configured
perplexity - Perplexity AI โ Not configured
Set with: urai cloud set-key perplexity <key>
4. List Models for a Provider
urai cloud models openai
Output:
๐ Models for OpenAI:
โข gpt-4
Use with: urai run openai:gpt-4
โข gpt-4-turbo
Use with: urai run openai:gpt-4-turbo
โข gpt-3.5-turbo
Use with: urai run openai:gpt-3.5-turbo
5. Test Connection
urai cloud test openai
Output:
๐งช Testing openai with gpt-3.5-turbo...
โ Success! Response: OK
๐ฌ Using Cloud Models
Interactive Chat
# OpenAI GPT-4
urai run openai:gpt-4
# Claude 3.5 Sonnet
urai run anthropic:claude-3-5-sonnet-20241022
# Perplexity (with online search!)
urai run perplexity:llama-3.1-sonar-large-128k-online
# Groq (ultra fast!)
urai run groq:llama-3.1-70b-versatile
# Mistral
urai run mistral:mistral-large-latest
API Server with Cloud Models
# Start server with GPT-4
urai serve openai:gpt-4 --port 11434
# Start server with Claude
urai serve anthropic:claude-3-5-sonnet-20241022
# Then use via API
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 512
}'
AI Agents with Cloud Models
# Create agent
urai agent create support --description "Customer support"
# Add documents
urai agent add support --file faq.txt
# Use cloud model with agent
urai agent run support --model anthropic:claude-3-5-sonnet-20241022
# Or via API (agents inherit the server's model)
urai serve openai:gpt-4
curl http://localhost:11434/v1/agents/support/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the refund policy?"}'
๐ง Advanced Usage
Mix Local and Cloud Models
# Set cloud model as default
urai run anthropic:claude-3-5-sonnet-20241022
# But still use local models when needed
urai run tinyllama
Remove API Keys
urai cloud remove-key openai
Model Format
Always use: provider:model
urai run openai:gpt-4 # โ Correct
urai run gpt-4 # โ Won't work (looks for local model)
๐ Complete Command Reference
# Cloud Provider Management
urai cloud providers # List all providers
urai cloud set-key <p> <key> # Set API key
urai cloud list-keys # Show configured keys
urai cloud remove-key <p> # Remove API key
urai cloud models <p> # List provider models
urai cloud test <p> # Test connection
# Using Cloud Models
urai run <provider>:<model> # Interactive chat
urai serve <provider>:<model> # Start API server
urai agent run <agent> --model <provider>:<model> # Agent with cloud model
# Examples
urai run openai:gpt-4
urai serve anthropic:claude-3-5-sonnet-20241022
urai agent run support --model groq:llama-3.1-70b-versatile
๐ฏ Use Cases
Development (Fast & Cheap)
urai run openai:gpt-3.5-turbo # Fast, affordable
urai run groq:llama-3.1-8b-instant # Ultra fast, free tier
Production (High Quality)
urai run anthropic:claude-3-opus-20240229 # Best reasoning
urai run openai:gpt-4 # Reliable
Research (With Web Access)
urai run perplexity:llama-3.1-sonar-large-128k-online # Real-time info
Cost-Effective Deployment
# Use Groq for speed, fallback to local
urai serve groq:llama-3.1-70b-versatile
๐ก๏ธ Security Notes
- API keys stored in
~/.urai/cloud_api_keys.json - Keys are also read from environment variables
- Never commit API keys to version control
- Use .env files for team projects
๐ฎ GPU Acceleration
URAI supports GPU acceleration through CUDA (NVIDIA) and Metal (Apple Silicon) for significantly faster inference.
Enable GPU Layers
Offload model layers to GPU for faster processing:
# Use GPU for inference (auto-detects CUDA/Metal)
urai run mistral --gpu-layers 35
# Serve with GPU acceleration
urai serve --gpu-layers 35
GPU Layer Guidelines
- Small models (1-3B): --gpu-layers 20-25
- Medium models (7B): --gpu-layers 30-35
- Large models (13B+): --gpu-layers 40-50
- Full GPU offload: Use --gpu-layers -1
Platform Support
| Platform | GPU Support | Requirements |
|---|---|---|
| Windows | CUDA (NVIDIA) | CUDA Toolkit + compatible GPU |
| macOS | Metal (Apple Silicon) | M1/M2/M3 chip |
| Linux | CUDA (NVIDIA) | CUDA Toolkit + compatible GPU |
๐ Browse Models
Discover and download thousands of GGUF models from TheBloke's collection on Hugging Face.
List Available Models
# List all available GGUF models from TheBloke
urai list-all
# Search for specific models
urai list-all | grep -i "llama"
urai list-all | grep -i "mistral"
urai list-all | grep -i "code"
Browse via Web Interface
Use the built-in web interface to search and explore models:
Visit the Models page to:
- Search thousands of GGUF models in real-time
- Filter by category (Code, Chat, Llama, etc.)
- View model sizes and descriptions
- Get instant CLI commands for download
Download from Browser
# After finding a model on the Models page:
urai add codellama-13b
# Or pull directly:
urai pull codellama-13b
Popular Model Categories
๐ฌ Chat Models
Zephyr, Mistral, Llama-2
๐ป Code Models
CodeLlama, WizardCoder
๐ Instruction Models
Mistral-Instruct, Llama-2-Chat
โก Small Models
TinyLlama, Phi-2
๐ Request Logging
URAI automatically logs all API requests for monitoring, debugging, and analytics.
Log Location
# View logs on Unix/Linux/macOS
cat ~/.urai/api_requests.log
# View logs on Windows
type %USERPROFILE%\.urai\api_requests.log
# Tail logs in real-time
tail -f ~/.urai/api_requests.log
Log Format
Each request is logged with detailed information:
[2025-01-15 14:32:45] POST /v1/chat/completions
Model: mistral
Temperature: 0.7
Max Tokens: 100
User Message: "Tell me a joke"
Response Time: 2.34s
Tokens Generated: 87
---
Log Analysis
# Count total requests
grep -c "POST /v1/chat/completions" ~/.urai/api_requests.log
# Find slow requests (>5 seconds)
grep "Response Time: [5-9]\." ~/.urai/api_requests.log
# List all models used
grep "Model:" ~/.urai/api_requests.log | sort | uniq -c
# View last 20 requests
tail -n 100 ~/.urai/api_requests.log
Logged Information
- โ Timestamp of each request
- โ Model name used
- โ Request parameters (temperature, max_tokens, etc.)
- โ User messages and prompts
- โ Response time (seconds)
- โ Tokens generated
- โ Endpoint accessed
Examples
List Models
urai list
Output:
๐ฆ Downloaded models:
โข mistral-7b-v0.1
โข mistral
โข phi-2
โข tinyllama-1.1b
โข tinyllama
๐ Available models in registry:
[โ] tinyllama - TinyLlama 1.1B - Small and fast (Tesing model)(669MB)
[ ] phi2 - Microsoft Phi-2 - 2.7B parameters (1.6GB)
[โ] mistral - Mistral 7B Instruct - High quality (4.4GB)
Download a Model
urai pull mistral
Run Chat Interface
urai run mistral
Serve as API
urai serve --port 8080
Add a Custom Model
urai add phi-2-GGUF
Or via URL:
urai add mymodel --url https://hf.co/TheBloke/mymodel.Q4_K_M.gguf
API Reference
Chat Completions
Endpoint
POST /v1/chat/completions
curl Example
curl --location 'http://localhost:11434/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "tinyllama",
"messages": [
{ "role": "user", "content": "Tell me a joke" }
],
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 100
}'
PowerShell Example
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")
$body = @"
{
`"model`": `"tinyllama`",
`"messages`": [
{ `"role`": `"user`", `"content`": `"Tell me a joke`" }
],
`"temperature`": 0.7,
`"top_p`": 0.9,
`"max_tokens`": 100
}
"@
$response = Invoke-RestMethod 'http://localhost:11434/v1/chat/completions' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json
Python Example
import requests
import json
url = "http://localhost:11434/v1/chat/completions"
payload = json.dumps({
"model": "tinyllama",
"messages": [
{
"role": "user",
"content": "Tell me a joke"
}
],
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 100
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Agent Query
Endpoint
POST /v1/agents/{agent_name}/query
Query an AI agent with RAG-powered context retrieval.
curl Example
curl --location 'http://localhost:11434/v1/agents/myagent/query' \
--header 'Content-Type: application/json' \
--data '{
"question": "who are you?"
}'
PowerShell Example
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")
$body = @"
{
`"question`": `"who are you?`"
}
"@
$response = Invoke-RestMethod 'http://localhost:11434/v1/agents/myagent/query' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json
Python Example
import requests
import json
url = "http://localhost:11434/v1/agents/myagent/query"
payload = json.dumps({
"question": "who are you?"
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Advanced Options
urai serve --ctx-size 8192 --gpu-layers 35 --port 8080
Logs
cat ~/.urai/api_requests.log
๐ Troubleshooting
Common issues and their solutions to help you get URAI running smoothly.
Server Won't Start
Problem: Server fails to start or crashes immediately
Solutions:
Check Python version (Need 3.8+):
python --version
Try different port:
python urai.py serve --port 8080
Check if port is already in use:
# Windows
netstat -ano | findstr :11434
# Linux/Mac
lsof -i :11434
Windows: Missing Build Tools
Problem: Installation fails with build errors on Windows
Solution: Install Visual Studio Build Tools (prerequisite for Windows)
Streaming Not Working
Problem: Chat responses appear all at once instead of streaming
Solutions:
Clear browser cache (Ctrl+Shift+Delete)
Try incognito/private browsing mode
Check browser console for errors (Press F12 โ Console tab)
Disable browser extensions (AdBlock, Privacy Badger, etc.)
Try a different browser (Chrome, Firefox, Edge)
Model Not Found
Problem: Model dropdown is empty or "model not found" error
Solutions:
Download a model first:
# Download a model
python urai.py pull tinyllama
# Verify it downloaded
python urai.py list
Refresh the browser page (F5 or Ctrl+R)
Check server logs for errors
GPU Acceleration Issues
Problem: GPU not being utilized or poor performance
Solutions:
NVIDIA: Ensure CUDA Toolkit is installed
Apple Silicon: Metal is automatic on M1/M2/M3
Try adjusting --gpu-layers value (start with 20, increase gradually)
Check GPU memory usage with nvidia-smi (NVIDIA) or Activity Monitor (Mac)
Agent Not Responding
Problem: Agent queries return empty or error responses
Solutions:
Verify agent has documents added (urai agent docs {name})
Check if a model is loaded (urai list)
Test agent in CLI mode first (urai agent run {name})
Ensure ChromaDB is installed (pip install chromadb)
Cloud Model API Key Issues
Problem: "API key not configured" or authentication errors
Solutions:
Verify key is set:
python urai.py cloud list-keys
Re-set the API key:
python urai.py cloud set-key openai sk-proj-xxxxx
Test the connection: python urai.py cloud test openai
Check if key has expired or has insufficient credits
๐ฌ Still Having Issues?
- โข Check the logs:
cat ~/.urai/api_requests.log - โข Join our Discord community for support
- โข Open an issue on GitHub
- โข Enable debug mode:
python urai.py serve --debug
Available Models (Registry)
| Model | Parameters | Size | Description |
|---|---|---|---|
| tinyllama | 1.1B | 669MB | Small and fast - Testing model |
| phi2 | 2.7B | 1.6GB | Microsoft Phi-2 |
| mistral | 7B | 4.4GB | High quality instruct model |
๐จ UI Pages
URAI includes a web-based UI for managing models, agents, and chat interactions. Access it at
http://localhost:11434 after starting the server.
1. Chat (๐ฌ)
Interactive chat interface with streaming responses
- Model selector (local + cloud models)
- Message history with streaming responses
- Input field with Send/Clear buttons
- Real-time token-by-token streaming
- Conversation context management
2. AI Agents (๐ค)
Manage RAG-powered agents with custom knowledge bases
- Grid view of all agent cards
- Create new agent button with form
- Test agent modal for interactive testing
- Add document modal (text/file upload)
- View and manage agent documents
- Delete agents and documents
3. Models (๐ฆ)
Browse and manage local and available models
- Downloaded models section with status
- Available models from registry
- Search GGUF models from TheBloke
- One-click download instructions
- Model size and parameter information
4. Cloud Config (โ๏ธ)
Configure cloud model providers
- Provider cards with configuration status
- Set/Update/Remove API key buttons
- Configuration status badges (โ Configured / โ Not configured)
- Model list for each provider
- Test connection functionality
๐ API Endpoints
Complete API reference for integrating with URAI programmatically.
Chat Endpoints
POST /v1/chat/completions
Chat with streaming responses (OpenAI-compatible)
{
"model": "tinyllama",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"max_tokens": 100,
"stream": true
}
Models Endpoints
GET /api/models
List all downloaded and available models
Agents Endpoints
GET /agents
List all agents with their metadata
POST /api/agent/create
Create a new agent
{
"name": "myagent",
"description": "Technical documentation assistant"
}
POST /api/agent/{name}/add-doc
Add document to agent's knowledge base
{
"text": "Your document content here...",
"filename": "documentation.txt"
}
POST /v1/agents/{name}/query
Query agent with RAG-powered retrieval
{
"question": "What does the documentation say about X?"
}
Cloud Endpoints
GET /api/cloud/providers
List all available cloud providers and their models
GET /api/cloud/keys
List configured API keys (status only, not the keys themselves)
POST /api/cloud/keys
Set or update an API key for a provider
{
"provider": "openai",
"api_key": "sk-proj-xxxxx"
}
DELETE /api/cloud/keys/{provider}
Remove an API key for a provider
๐ก Usage Examples
Real-world examples to get you started quickly with URAI.
Example 1: Quick Chat
Start the server with a model:
python urai.py serve tinyllama
Open http://localhost:11434 in your browser
Select "tinyllama" from the model dropdown
Type: "Hello, explain Python decorators"
Watch the response stream in real-time! ๐
Example 2: Create Knowledge Agent
Open browser โ Navigate to AI Agents page
Click "Create New Agent"
Name: "docs", Description: "Product documentation"
Click "Add Doc" โ Paste your documentation text
Click "Test" โ Ask "What is the return policy?"
See RAG-powered answer based on your docs! ๐
Example 3: Use Cloud Model
Open browser โ Navigate to Cloud Config page
Find the "OpenAI" provider card
Click "Set API Key" and enter your key:
sk-proj-xxxxx...
Return to Chat page
Select "openai:gpt-4" from model dropdown
Start chatting with GPT-4! โ๏ธ
๐ Troubleshooting Guide
Common issues and their solutions to help you get URAI running smoothly.
Server Won't Start
Problem: Server fails to start or crashes immediately
Solutions:
Check Python version (Need 3.8+):
python --version
Try different port:
python urai.py serve --port 8080
Check if port is already in use:
# Windows
netstat -ano | findstr :11434
# Linux/Mac
lsof -i :11434
Streaming Not Working
Problem: Chat responses appear all at once instead of streaming
Solutions:
Clear browser cache (Ctrl+Shift+Delete)
Try incognito/private browsing mode
Check browser console for errors (Press F12 โ Console tab)
Disable browser extensions (AdBlock, Privacy Badger, etc.)
Try a different browser (Chrome, Firefox, Edge)
Models Not Showing
Problem: Model dropdown is empty or models aren't listed
Solutions:
Download a model first:
# Download a model
python urai.py pull tinyllama
# Verify it downloaded
python urai.py list
Refresh the browser page (F5 or Ctrl+R)
Check server logs for errors
GPU Acceleration Issues
Problem: GPU not being utilized or poor performance
Solutions:
NVIDIA: Ensure CUDA Toolkit is installed
Apple Silicon: Metal is automatic on M1/M2/M3
Try adjusting --gpu-layers value (start with 20, increase gradually)
Check GPU memory usage with nvidia-smi (NVIDIA) or Activity Monitor (Mac)
Agent Not Responding
Problem: Agent queries return empty or error responses
Solutions:
Verify agent has documents added (urai agent docs {name})
Check if a model is loaded (urai list)
Test agent in CLI mode first (urai agent run {name})
Ensure ChromaDB is installed (pip install chromadb)
Cloud Model API Key Issues
Problem: "API key not configured" or authentication errors
Solutions:
Verify key is set:
python urai.py cloud list-keys
Re-set the API key:
python urai.py cloud set-key openai sk-proj-xxxxx
Test the connection: python urai.py cloud test openai
Check if key has expired or has insufficient credits
๐ฌ Still Having Issues?
- โข Check the logs:
cat ~/.urai/api_requests.log - โข Join our Discord community for support
- โข Open an issue on GitHub
- โข Enable debug mode:
python urai.py serve --debug