Documentation

URAI is a command-line and API toolkit for managing and serving local quantized models (GGUF). It lets you download, run, and serve models locally.

📥

Easy Model Management

Download and manage GGUF models from HuggingFace with simple commands

urai pull mistral

💬

Interactive Chat

Token-by-token streaming responses in your terminal effortlessly

urai run mistral

🔌

OpenAI-Compatible API

Drop-in replacement for OpenAI's API with local models

urai serve

🎮

GPU Acceleration

Leverage CUDA or Metal for faster inference

urai serve --gpu-layers 35

🌐

Browse Models

Discover thousands of models from TheBloke

urai list-all

📋

Request Logging

Track all API requests with detailed logs

tail -f ~/.urai/api_requests.log

🤖

AI Agents

Create RAG-powered agents with custom knowledge bases

urai agent create myagent

☁️

Cloud Models

Use GPT-4, Claude, and other cloud models alongside local models

urai run openai:gpt-4

Installation

Download the installer for your platform from the home page and follow the installation instructions.

Basic Commands

usage: urai.py [-h] {pull,list,list-all,add,rm,run,serve,agent} ...

Command	Description
`pull`	Download a model
`list`	List downloaded and recommended models
`list-all`	Browse all GGUF models from TheBloke
`add`	Add a custom model
`rm`	Remove a model
`run`	Start interactive chat
`serve`	Start local API server
`agent`	Manage AI agents with custom knowledge

AI Agents (RAG)

Create custom AI agents with Retrieval-Augmented Generation (RAG). Each agent has its own knowledge base stored in a vector database, allowing for context-aware responses based on your documents.

🌟 Key Features

Separate vector DB per agent - Isolated knowledge bases
No separate serve command - Agents auto-load with urai serve
Interactive testing - Test agents before deploying
Streaming responses - Token-by-token in CLI
Document management - Add/list/delete documents
API endpoints - Query any agent via REST API
Persistent storage - All data saved locally
Chunking & embeddings - Smart text processing

1. Create a New Agent

urai agent create myagent --description "Technical documentation assistant"

2. Add Knowledge to Agents

Add files or text directly to your agent's knowledge base:

# Add files
urai agent add myagent --file documentation.txt
urai agent add myagent --file manual.pdf

# Add text directly
urai agent add myagent --text "Important information here"

3. Test Agents Interactively

Chat with your agent locally before deploying:

# Chat with your agent (uses RAG)
urai agent run myagent

# With specific model
urai agent run myagent --model tinyllama

4. Manage Agents

# List all agents
urai agent list

# List documents in an agent
urai agent docs myagent

# Delete a document
urai agent delete-doc myagent abc12345

# Delete entire agent
urai agent delete myagent

5. API Access (Automatic!)

Agents are automatically loaded when you start the server:

# Start server (agents are automatically loaded)
urai serve

Query your agent via API:

curl http://localhost:11434/v1/agents/myagent/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What does the documentation say about X?"}'

# List all agents
curl http://localhost:11434/agents

🏗️ Technical Architecture

Vector Database: ChromaDB (persistent storage)
Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
Storage Structure:

~/.urai/agents/
  ├── myagent/
  │   ├── vectordb/          # ChromaDB storage
  │   ├── metadata.json      # Agent info
  │   └── documents.json     # Document registry
  └── anotheragent/
      └── ...

🔄 Complete Workflow Example

# 1. Create agent
urai agent create support --description "Customer support KB"

# 2. Add knowledge
urai agent add support --file faq.txt
urai agent add support --file policies.txt

# 3. Test locally
urai agent run support
# You: What is the refund policy?
# support: Based on the documentation... [RAG answer]

# 4. Deploy via API
urai serve
# All agents automatically available at /v1/agents/{name}/query

☁️ Cloud Models

URAI supports cloud-based language models from major providers alongside local models. Use powerful cloud models like GPT-4, Claude, or Perplexity when you need cutting-edge performance.

🌟 Supported Providers

OpenAI - GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
Anthropic - Claude 3 Opus, Sonnet, Haiku
Perplexity - Models with real-time web search
Mistral - Mistral Large, Medium, Small
Groq - Ultra-fast inference with Llama models

🚀 Quick Start

1. View Available Providers

urai cloud providers

Output:

☁️  Available Cloud Providers:

  openai       - OpenAI
               Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo...
               Env: OPENAI_API_KEY

  anthropic    - Anthropic (Claude)
               Models: claude-3-opus-20240229, claude-3-sonnet...
               Env: ANTHROPIC_API_KEY

  perplexity   - Perplexity AI
               Models: llama-3.1-sonar-large-128k-online...
               Env: PERPLEXITY_API_KEY

  mistral      - Mistral AI
               Models: mistral-large-latest...
               Env: MISTRAL_API_KEY

  groq         - Groq
               Models: llama-3.1-70b-versatile...

2. Set API Keys

Option A: Via Command

# OpenAI
urai cloud set-key openai sk-proj-xxxxx

# Claude
urai cloud set-key anthropic sk-ant-xxxxx

# Perplexity
urai cloud set-key perplexity pplx-xxxxx

# Mistral
urai cloud set-key mistral xxxxx

# Groq
urai cloud set-key groq gsk_xxxxx

Option B: Via Environment Variables

# Windows
set OPENAI_API_KEY=sk-proj-xxxxx
set ANTHROPIC_API_KEY=sk-ant-xxxxx

# Linux/Mac
export OPENAI_API_KEY=sk-proj-xxxxx
export ANTHROPIC_API_KEY=sk-ant-xxxxx

3. Check Configuration

urai cloud list-keys

Output:

🔑 API Key Configuration:

  openai       - OpenAI              ✓ Configured
  anthropic    - Anthropic (Claude)  ✓ Configured
  perplexity   - Perplexity AI       ✗ Not configured
               Set with: urai cloud set-key perplexity <key>

4. List Models for a Provider

urai cloud models openai

Output:

📋 Models for OpenAI:

  • gpt-4
    Use with: urai run openai:gpt-4
  • gpt-4-turbo
    Use with: urai run openai:gpt-4-turbo
  • gpt-3.5-turbo
    Use with: urai run openai:gpt-3.5-turbo

5. Test Connection

urai cloud test openai

Output:

🧪 Testing openai with gpt-3.5-turbo...
✓ Success! Response: OK

💬 Using Cloud Models

Interactive Chat

# OpenAI GPT-4
urai run openai:gpt-4

# Claude 3.5 Sonnet
urai run anthropic:claude-3-5-sonnet-20241022

# Perplexity (with online search!)
urai run perplexity:llama-3.1-sonar-large-128k-online

# Groq (ultra fast!)
urai run groq:llama-3.1-70b-versatile

# Mistral
urai run mistral:mistral-large-latest

API Server with Cloud Models

# Start server with GPT-4
urai serve openai:gpt-4 --port 11434

# Start server with Claude
urai serve anthropic:claude-3-5-sonnet-20241022

# Then use via API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 512
  }'

AI Agents with Cloud Models

# Create agent
urai agent create support --description "Customer support"

# Add documents
urai agent add support --file faq.txt

# Use cloud model with agent
urai agent run support --model anthropic:claude-3-5-sonnet-20241022

# Or via API (agents inherit the server's model)
urai serve openai:gpt-4
curl http://localhost:11434/v1/agents/support/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the refund policy?"}'

🔧 Advanced Usage

Mix Local and Cloud Models

# Set cloud model as default
urai run anthropic:claude-3-5-sonnet-20241022

# But still use local models when needed
urai run tinyllama

Remove API Keys

urai cloud remove-key openai

Model Format

Always use: provider:model

urai run openai:gpt-4              # ✓ Correct
urai run gpt-4                     # ✗ Won't work (looks for local model)

📊 Complete Command Reference

# Cloud Provider Management
urai cloud providers              # List all providers
urai cloud set-key <p> <key>      # Set API key
urai cloud list-keys              # Show configured keys
urai cloud remove-key <p>         # Remove API key
urai cloud models <p>             # List provider models
urai cloud test <p>               # Test connection

# Using Cloud Models
urai run <provider>:<model>       # Interactive chat
urai serve <provider>:<model>     # Start API server
urai agent run <agent> --model <provider>:<model>  # Agent with cloud model

# Examples
urai run openai:gpt-4
urai serve anthropic:claude-3-5-sonnet-20241022
urai agent run support --model groq:llama-3.1-70b-versatile

🎯 Use Cases

Development (Fast & Cheap)

urai run openai:gpt-3.5-turbo      # Fast, affordable
urai run groq:llama-3.1-8b-instant # Ultra fast, free tier

Production (High Quality)

urai run anthropic:claude-3-opus-20240229    # Best reasoning
urai run openai:gpt-4                        # Reliable

Research (With Web Access)

urai run perplexity:llama-3.1-sonar-large-128k-online  # Real-time info

Cost-Effective Deployment

# Use Groq for speed, fallback to local
urai serve groq:llama-3.1-70b-versatile

🛡️ Security Notes

API keys stored in ~/.urai/cloud_api_keys.json
Keys are also read from environment variables
Never commit API keys to version control
Use .env files for team projects

🎮 GPU Acceleration

URAI supports GPU acceleration through CUDA (NVIDIA) and Metal (Apple Silicon) for significantly faster inference.

Enable GPU Layers

Offload model layers to GPU for faster processing:

# Use GPU for inference (auto-detects CUDA/Metal)
urai run mistral --gpu-layers 35

# Serve with GPU acceleration
urai serve --gpu-layers 35

GPU Layer Guidelines

Small models (1-3B): --gpu-layers 20-25
Medium models (7B): --gpu-layers 30-35
Large models (13B+): --gpu-layers 40-50
Full GPU offload: Use --gpu-layers -1

Platform Support

Platform	GPU Support	Requirements
Windows	CUDA (NVIDIA)	CUDA Toolkit + compatible GPU
macOS	Metal (Apple Silicon)	M1/M2/M3 chip
Linux	CUDA (NVIDIA)	CUDA Toolkit + compatible GPU

🌐 Browse Models

Discover and download thousands of GGUF models from TheBloke's collection on Hugging Face.

List Available Models

# List all available GGUF models from TheBloke
urai list-all

# Search for specific models
urai list-all | grep -i "llama"
urai list-all | grep -i "mistral"
urai list-all | grep -i "code"

Browse via Web Interface

Use the built-in web interface to search and explore models:

Visit the Models page to:

Search thousands of GGUF models in real-time
Filter by category (Code, Chat, Llama, etc.)
View model sizes and descriptions
Get instant CLI commands for download

Download from Browser

# After finding a model on the Models page:
urai add codellama-13b

# Or pull directly:
urai pull codellama-13b

Popular Model Categories

💬 Chat Models

Zephyr, Mistral, Llama-2

💻 Code Models

CodeLlama, WizardCoder

📊 Instruction Models

Mistral-Instruct, Llama-2-Chat

⚡ Small Models

TinyLlama, Phi-2

📋 Request Logging

URAI automatically logs all API requests for monitoring, debugging, and analytics.

Log Location

# View logs on Unix/Linux/macOS
cat ~/.urai/api_requests.log

# View logs on Windows
type %USERPROFILE%\.urai\api_requests.log

# Tail logs in real-time
tail -f ~/.urai/api_requests.log

Log Format

Each request is logged with detailed information:

[2025-01-15 14:32:45] POST /v1/chat/completions
Model: mistral
Temperature: 0.7
Max Tokens: 100
User Message: "Tell me a joke"
Response Time: 2.34s
Tokens Generated: 87
---

Log Analysis

# Count total requests
grep -c "POST /v1/chat/completions" ~/.urai/api_requests.log

# Find slow requests (>5 seconds)
grep "Response Time: [5-9]\." ~/.urai/api_requests.log

# List all models used
grep "Model:" ~/.urai/api_requests.log | sort | uniq -c

# View last 20 requests
tail -n 100 ~/.urai/api_requests.log

Logged Information

✅ Timestamp of each request
✅ Model name used
✅ Request parameters (temperature, max_tokens, etc.)
✅ User messages and prompts
✅ Response time (seconds)
✅ Tokens generated
✅ Endpoint accessed

Examples

List Models

urai list

Output:

📦 Downloaded models:
  • mistral-7b-v0.1
  • mistral
  • phi-2
  • tinyllama-1.1b
  • tinyllama

🌐 Available models in registry:
  [✓] tinyllama       - TinyLlama 1.1B - Small and fast (Tesing model)(669MB)
  [ ] phi2            - Microsoft Phi-2 - 2.7B parameters (1.6GB)
  [✓] mistral         - Mistral 7B Instruct - High quality (4.4GB)

Download a Model

urai pull mistral

Run Chat Interface

urai run mistral

Serve as API

urai serve --port 8080

Add a Custom Model

urai add phi-2-GGUF

Or via URL:

urai add mymodel --url https://hf.co/TheBloke/mymodel.Q4_K_M.gguf

API Reference

Chat Completions

Endpoint

POST /v1/chat/completions

curl Example

curl --location 'http://localhost:11434/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "tinyllama",
    "messages": [
      { "role": "user", "content": "Tell me a joke" }
    ],
    "temperature": 0.7,
    "top_p": 0.9,
    "max_tokens": 100
  }'

PowerShell Example

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")

$body = @"
{
  `"model`": `"tinyllama`",
  `"messages`": [
    { `"role`": `"user`", `"content`": `"Tell me a joke`" }
  ],
  `"temperature`": 0.7,
  `"top_p`": 0.9,
  `"max_tokens`": 100
}
"@

$response = Invoke-RestMethod 'http://localhost:11434/v1/chat/completions' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json

Python Example

import requests
import json

url = "http://localhost:11434/v1/chat/completions"

payload = json.dumps({
  "model": "tinyllama",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke"
    }
  ],
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 100
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Agent Query

Endpoint

POST /v1/agents/{agent_name}/query

Query an AI agent with RAG-powered context retrieval.

curl Example

curl --location 'http://localhost:11434/v1/agents/myagent/query' \
--header 'Content-Type: application/json' \
--data '{
  "question": "who are you?"
}'

PowerShell Example

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")

$body = @"
{
  `"question`": `"who are you?`"
}
"@

$response = Invoke-RestMethod 'http://localhost:11434/v1/agents/myagent/query' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json

Python Example

import requests
import json

url = "http://localhost:11434/v1/agents/myagent/query"

payload = json.dumps({
  "question": "who are you?"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Advanced Options

urai serve --ctx-size 8192 --gpu-layers 35 --port 8080

Logs

cat ~/.urai/api_requests.log

🐛 Troubleshooting

Common issues and their solutions to help you get URAI running smoothly.

Server Won't Start

Problem: Server fails to start or crashes immediately

Solutions:

Check Python version (Need 3.8+):

python --version

Try different port:

python urai.py serve --port 8080

Check if port is already in use:

# Windows
netstat -ano | findstr :11434

# Linux/Mac
lsof -i :11434

Windows: Missing Build Tools

Problem: Installation fails with build errors on Windows

Solution: Install Visual Studio Build Tools (prerequisite for Windows)

Streaming Not Working

Problem: Chat responses appear all at once instead of streaming

Solutions:

•

Clear browser cache (Ctrl+Shift+Delete)

•

Try incognito/private browsing mode

•

Check browser console for errors (Press F12 → Console tab)

•

Disable browser extensions (AdBlock, Privacy Badger, etc.)

•

Try a different browser (Chrome, Firefox, Edge)

Model Not Found

Problem: Model dropdown is empty or "model not found" error

Solutions:

Download a model first:

# Download a model
python urai.py pull tinyllama

# Verify it downloaded
python urai.py list

•

Refresh the browser page (F5 or Ctrl+R)

•

Check server logs for errors

GPU Acceleration Issues

Problem: GPU not being utilized or poor performance

Solutions:

•

NVIDIA: Ensure CUDA Toolkit is installed

•

Apple Silicon: Metal is automatic on M1/M2/M3

•

Try adjusting --gpu-layers value (start with 20, increase gradually)

•

Check GPU memory usage with nvidia-smi (NVIDIA) or Activity Monitor (Mac)

Agent Not Responding

Problem: Agent queries return empty or error responses

Solutions:

•

Verify agent has documents added (urai agent docs {name})

•

Check if a model is loaded (urai list)

•

Test agent in CLI mode first (urai agent run {name})

•

Ensure ChromaDB is installed (pip install chromadb)

Cloud Model API Key Issues

Problem: "API key not configured" or authentication errors

Solutions:

Verify key is set:

python urai.py cloud list-keys

Re-set the API key:

python urai.py cloud set-key openai sk-proj-xxxxx

•

Test the connection: python urai.py cloud test openai

•

Check if key has expired or has insufficient credits

💬 Still Having Issues?

• Check the logs: cat ~/.urai/api_requests.log
• Join our Discord community for support
• Open an issue on GitHub
• Enable debug mode: python urai.py serve --debug

Available Models (Registry)

Model	Parameters	Size	Description
tinyllama	1.1B	669MB	Small and fast - Testing model
phi2	2.7B	1.6GB	Microsoft Phi-2
mistral	7B	4.4GB	High quality instruct model

🎨 UI Pages

URAI includes a web-based UI for managing models, agents, and chat interactions. Access it at http://localhost:11434 after starting the server.

1. Chat (💬)

Interactive chat interface with streaming responses

Model selector (local + cloud models)
Message history with streaming responses
Input field with Send/Clear buttons
Real-time token-by-token streaming
Conversation context management

2. AI Agents (🤖)

Manage RAG-powered agents with custom knowledge bases

Grid view of all agent cards
Create new agent button with form
Test agent modal for interactive testing
Add document modal (text/file upload)
View and manage agent documents
Delete agents and documents

3. Models (📦)

Browse and manage local and available models

Downloaded models section with status
Available models from registry
Search GGUF models from TheBloke
One-click download instructions
Model size and parameter information

4. Cloud Config (☁️)

Configure cloud model providers

Provider cards with configuration status
Set/Update/Remove API key buttons
Configuration status badges (✓ Configured / ✗ Not configured)
Model list for each provider
Test connection functionality

🌐 API Endpoints

Complete API reference for integrating with URAI programmatically.

Chat Endpoints

POST /v1/chat/completions

Chat with streaming responses (OpenAI-compatible)

{
  "model": "tinyllama",
  "messages": [{"role": "user", "content": "Hello"}],
  "temperature": 0.7,
  "max_tokens": 100,
  "stream": true
}

Models Endpoints

GET /api/models

List all downloaded and available models

Agents Endpoints

GET /agents

List all agents with their metadata

POST /api/agent/create

Create a new agent

{
  "name": "myagent",
  "description": "Technical documentation assistant"
}

POST /api/agent/{name}/add-doc

Add document to agent's knowledge base

{
  "text": "Your document content here...",
  "filename": "documentation.txt"
}

POST /v1/agents/{name}/query

Query agent with RAG-powered retrieval

{
  "question": "What does the documentation say about X?"
}

Cloud Endpoints

GET /api/cloud/providers

List all available cloud providers and their models

GET /api/cloud/keys

List configured API keys (status only, not the keys themselves)

POST /api/cloud/keys

Set or update an API key for a provider

{
  "provider": "openai",
  "api_key": "sk-proj-xxxxx"
}

DELETE /api/cloud/keys/{provider}

Remove an API key for a provider

💡 Usage Examples

Real-world examples to get you started quickly with URAI.

Example 1: Quick Chat

Start the server with a model:

python urai.py serve tinyllama

Open http://localhost:11434 in your browser

Select "tinyllama" from the model dropdown

Type: "Hello, explain Python decorators"

Watch the response stream in real-time! 🎉

Example 2: Create Knowledge Agent

Open browser → Navigate to AI Agents page

Click "Create New Agent"

Name: "docs", Description: "Product documentation"

Click "Add Doc" → Paste your documentation text

Click "Test" → Ask "What is the return policy?"

See RAG-powered answer based on your docs! 🚀

Example 3: Use Cloud Model

Open browser → Navigate to Cloud Config page

Find the "OpenAI" provider card

Click "Set API Key" and enter your key:

sk-proj-xxxxx...

Return to Chat page

Select "openai:gpt-4" from model dropdown

Start chatting with GPT-4! ☁️

🐛 Troubleshooting Guide

Common issues and their solutions to help you get URAI running smoothly.

Server Won't Start

Problem: Server fails to start or crashes immediately

Solutions:

Check Python version (Need 3.8+):

python --version

Try different port:

python urai.py serve --port 8080

Check if port is already in use:

# Windows
netstat -ano | findstr :11434

# Linux/Mac
lsof -i :11434

Streaming Not Working

Problem: Chat responses appear all at once instead of streaming

Solutions:

•

Clear browser cache (Ctrl+Shift+Delete)

•

Try incognito/private browsing mode

•

Check browser console for errors (Press F12 → Console tab)

•

Disable browser extensions (AdBlock, Privacy Badger, etc.)

•

Try a different browser (Chrome, Firefox, Edge)

Models Not Showing

Problem: Model dropdown is empty or models aren't listed

Solutions:

Download a model first:

# Download a model
python urai.py pull tinyllama

# Verify it downloaded
python urai.py list

•

Refresh the browser page (F5 or Ctrl+R)

•

Check server logs for errors

GPU Acceleration Issues

Problem: GPU not being utilized or poor performance

Solutions:

•

NVIDIA: Ensure CUDA Toolkit is installed

•

Apple Silicon: Metal is automatic on M1/M2/M3

•

Try adjusting --gpu-layers value (start with 20, increase gradually)

•

Check GPU memory usage with nvidia-smi (NVIDIA) or Activity Monitor (Mac)

Agent Not Responding

Problem: Agent queries return empty or error responses

Solutions:

•

Verify agent has documents added (urai agent docs {name})

•

Check if a model is loaded (urai list)

•

Test agent in CLI mode first (urai agent run {name})

•

Ensure ChromaDB is installed (pip install chromadb)

Cloud Model API Key Issues

Problem: "API key not configured" or authentication errors

Solutions:

Verify key is set:

python urai.py cloud list-keys

Re-set the API key:

python urai.py cloud set-key openai sk-proj-xxxxx

•

Test the connection: python urai.py cloud test openai

•

Check if key has expired or has insufficient credits

💬 Still Having Issues?

• Check the logs: cat ~/.urai/api_requests.log
• Join our Discord community for support
• Open an issue on GitHub
• Enable debug mode: python urai.py serve --debug