Advanced Provider Features
This guide covers advanced features and optimizations for Routstr providers who want to maximize performance, reliability, and profitability of their AI model hosting.
Automated Pricing
Dynamic Pricing
Configure your provider to automatically adjust pricing based on demand:
"provider": {
"dynamicPricing": {
"enabled": true,
"metrics": ["utilization", "demand"],
"maxMultiplier": 3.0,
"minMultiplier": 0.5,
"updateInterval": 300, // seconds
"algorithm": "linear" // or "exponential"
}
}
Cost-Based Pricing
Set pricing based on your actual costs:
"provider": {
"costBasedPricing": {
"enabled": true,
"computeCostPerHour": 0.50, // USD
"targetMargin": 30, // percentage
"includeElectricity": true,
"includeOperational": true
}
}
Load Balancing
Auto-Scaling
Configure your provider to scale resources based on demand:
"provider": {
"autoScaling": {
"enabled": true,
"minInstances": 1,
"maxInstances": 5,
"scaleUpThreshold": 80, // percentage utilization
"scaleDownThreshold": 30,
"cooldownPeriod": 300 // seconds
}
}
Request Queue Management
Optimize request handling during peak loads:
"provider": {
"queueManagement": {
"enabled": true,
"maxQueueLength": 100,
"timeoutSeconds": 30,
"priorityLevels": 3,
"fairnessPolicy": "round-robin" // or "fifo", "priority"
}
}
Model Management
Model Switching
Configure automatic model switching based on load:
"provider": {
"modelSwitching": {
"enabled": true,
"primaryModel": "llama-3-70b",
"fallbackModels": ["llama-3-8b", "mistral-7b"],
"switchCriteria": {
"queueLength": 10,
"utilizationThreshold": 90,
"latencyThreshold": 5000 // ms
}
}
}
Model Caching
Implement caching for common requests:
"provider": {
"caching": {
"enabled": true,
"storageLimit": "1GB",
"ttl": 3600, // seconds
"similarityThreshold": 0.95,
"excludedPrompts": ["pattern1", "pattern2"]
}
}
Network Optimization
Request Batching
Batch similar requests for improved efficiency:
"provider": {
"requestBatching": {
"enabled": true,
"maxBatchSize": 5,
"maxWaitTime": 100, // ms
"similarityThreshold": 0.8
}
}
Inference Optimization
Configure inference parameters for optimal performance:
"provider": {
"inference": {
"precision": "fp16", // or "fp32", "int8", "int4"
"batchSize": 4,
"prefillChunkSize": 2048,
"beamSize": 1,
"cudaMemoryFraction": 0.9
}
}
Security Features
Rate Limiting
Protect your service from abuse:
"provider": {
"rateLimiting": {
"enabled": true,
"perIp": {
"requestsPerMinute": 30,
"tokensPerHour": 100000
},
"perUser": {
"requestsPerMinute": 60,
"tokensPerDay": 1000000
},
"global": {
"maxConcurrentRequests": 100
}
}
}
Content Filtering
Implement content filtering for safety:
"provider": {
"contentFiltering": {
"enabled": true,
"level": "medium", // "low", "medium", "high"
"filterPII": true,
"blockedCategories": ["illegal", "harmful"],
"customPatterns": [
"regex1", "regex2"
]
}
}
Analytics and Monitoring
Prometheus Integration
Export metrics to Prometheus:
"monitoring": {
"prometheus": {
"enabled": true,
"port": 9090,
"path": "/metrics",
"labels": {
"environment": "production",
"provider": "your-provider-name"
}
}
}
Grafana Dashboard
Set up a Grafana dashboard template:
"monitoring": {
"grafana": {
"enabled": true,
"dashboardTemplate": "default", // or "custom"
"customDashboardPath": "/path/to/dashboard.json",
"alerting": {
"enabled": true,
"contactPoints": ["email:admin@example.com"]
}
}
}
Reliability Features
Circuit Breakers
Implement circuit breakers to prevent cascading failures:
"provider": {
"circuitBreakers": {
"enabled": true,
"failureThreshold": 5,
"resetTimeout": 30, // seconds
"halfOpenRequests": 3
}
}
Health Checks
Configure comprehensive health checks:
"provider": {
"healthChecks": {
"enabled": true,
"interval": 60, // seconds
"timeout": 5, // seconds
"checks": [
{
"type": "model",
"endpoint": "endpoint1",
"testPrompt": "Hello, world"
},
{
"type": "system",
"metric": "memory",
"threshold": 90 // percentage
}
]
}
}
Multi-Region Deployment
Deploy your provider across multiple regions:
"provider": {
"regions": {
"enabled": true,
"primary": "us-east",
"secondary": ["eu-west", "ap-east"],
"routingStrategy": "latency", // or "geo", "load"
"syncSettings": {
"enabled": true,
"interval": 300 // seconds
}
}
}
Administration Features
Command-Line Management
Advanced CLI commands for provider management:
# Performance tuning
routstr-provider tune --optimize-for throughput
# Model management
routstr-provider models add /path/to/local/model --name "your-namespace/model-name"
# Backup and restore
routstr-provider backup --output backup.zip
routstr-provider restore --input backup.zip
# Network testing
routstr-provider network test --target user123 --simulate-load
API Management
Configure the admin API for your provider:
"admin": {
"api": {
"enabled": true,
"port": 8080,
"authToken": "your-secure-token",
"allowedIPs": ["127.0.0.1", "192.168.1.0/24"],
"tlsEnabled": true
}
}
Custom Templates
Create custom templates for provider configurations:
# Save current configuration as template
routstr-provider template save --name "high-performance"
# Apply template
routstr-provider template apply --name "high-performance"
# Share template (exports a shareable JSON file)
routstr-provider template export --name "high-performance" --output template.json
Best Practices
- Start Small: Begin with basic settings and gradually enable advanced features
- Monitor Performance: Keep track of key metrics before and after changes
- Test Thoroughly: Test advanced features in a staging environment first
- Incremental Optimization: Make one change at a time and measure impact
- Regular Updates: Keep your models and software updated
Next Steps