Advanced Provider Features

This guide covers advanced features and optimizations for Routstr providers who want to maximize performance, reliability, and profitability of their AI model hosting.

Automated Pricing

Dynamic Pricing

Configure your provider to automatically adjust pricing based on demand:

"provider": {
  "dynamicPricing": {
    "enabled": true,
    "metrics": ["utilization", "demand"],
    "maxMultiplier": 3.0,
    "minMultiplier": 0.5,
    "updateInterval": 300,  // seconds
    "algorithm": "linear"  // or "exponential"
  }
}

Cost-Based Pricing

Set pricing based on your actual costs:

"provider": {
  "costBasedPricing": {
    "enabled": true,
    "computeCostPerHour": 0.50,  // USD
    "targetMargin": 30,  // percentage
    "includeElectricity": true,
    "includeOperational": true
  }
}

Load Balancing

Auto-Scaling

Configure your provider to scale resources based on demand:

"provider": {
  "autoScaling": {
    "enabled": true,
    "minInstances": 1,
    "maxInstances": 5,
    "scaleUpThreshold": 80,  // percentage utilization
    "scaleDownThreshold": 30,
    "cooldownPeriod": 300  // seconds
  }
}

Request Queue Management

Optimize request handling during peak loads:

"provider": {
  "queueManagement": {
    "enabled": true,
    "maxQueueLength": 100,
    "timeoutSeconds": 30,
    "priorityLevels": 3,
    "fairnessPolicy": "round-robin"  // or "fifo", "priority"
  }
}

Model Management

Model Switching

Configure automatic model switching based on load:

"provider": {
  "modelSwitching": {
    "enabled": true,
    "primaryModel": "llama-3-70b",
    "fallbackModels": ["llama-3-8b", "mistral-7b"],
    "switchCriteria": {
      "queueLength": 10,
      "utilizationThreshold": 90,
      "latencyThreshold": 5000  // ms
    }
  }
}

Model Caching

Implement caching for common requests:

"provider": {
  "caching": {
    "enabled": true,
    "storageLimit": "1GB",
    "ttl": 3600,  // seconds
    "similarityThreshold": 0.95,
    "excludedPrompts": ["pattern1", "pattern2"]
  }
}

Network Optimization

Request Batching

Batch similar requests for improved efficiency:

"provider": {
  "requestBatching": {
    "enabled": true,
    "maxBatchSize": 5,
    "maxWaitTime": 100,  // ms
    "similarityThreshold": 0.8
  }
}

Inference Optimization

Configure inference parameters for optimal performance:

"provider": {
  "inference": {
    "precision": "fp16",  // or "fp32", "int8", "int4"
    "batchSize": 4,
    "prefillChunkSize": 2048,
    "beamSize": 1,
    "cudaMemoryFraction": 0.9
  }
}

Security Features

Rate Limiting

Protect your service from abuse:

"provider": {
  "rateLimiting": {
    "enabled": true,
    "perIp": {
      "requestsPerMinute": 30,
      "tokensPerHour": 100000
    },
    "perUser": {
      "requestsPerMinute": 60,
      "tokensPerDay": 1000000
    },
    "global": {
      "maxConcurrentRequests": 100
    }
  }
}

Content Filtering

Implement content filtering for safety:

"provider": {
  "contentFiltering": {
    "enabled": true,
    "level": "medium",  // "low", "medium", "high"
    "filterPII": true,
    "blockedCategories": ["illegal", "harmful"],
    "customPatterns": [
      "regex1", "regex2"
    ]
  }
}

Analytics and Monitoring

Prometheus Integration

Export metrics to Prometheus:

"monitoring": {
  "prometheus": {
    "enabled": true,
    "port": 9090,
    "path": "/metrics",
    "labels": {
      "environment": "production",
      "provider": "your-provider-name"
    }
  }
}

Grafana Dashboard

Set up a Grafana dashboard template:

"monitoring": {
  "grafana": {
    "enabled": true,
    "dashboardTemplate": "default",  // or "custom"
    "customDashboardPath": "/path/to/dashboard.json",
    "alerting": {
      "enabled": true,
      "contactPoints": ["email:admin@example.com"]
    }
  }
}

Reliability Features

Circuit Breakers

Implement circuit breakers to prevent cascading failures:

"provider": {
  "circuitBreakers": {
    "enabled": true,
    "failureThreshold": 5,
    "resetTimeout": 30,  // seconds
    "halfOpenRequests": 3
  }
}

Health Checks

Configure comprehensive health checks:

"provider": {
  "healthChecks": {
    "enabled": true,
    "interval": 60,  // seconds
    "timeout": 5,    // seconds
    "checks": [
      {
        "type": "model",
        "endpoint": "endpoint1",
        "testPrompt": "Hello, world"
      },
      {
        "type": "system",
        "metric": "memory",
        "threshold": 90  // percentage
      }
    ]
  }
}

Multi-Region Deployment

Deploy your provider across multiple regions:

"provider": {
  "regions": {
    "enabled": true,
    "primary": "us-east",
    "secondary": ["eu-west", "ap-east"],
    "routingStrategy": "latency",  // or "geo", "load"
    "syncSettings": {
      "enabled": true,
      "interval": 300  // seconds
    }
  }
}

Administration Features

Command-Line Management

Advanced CLI commands for provider management:

# Performance tuning
routstr-provider tune --optimize-for throughput

# Model management
routstr-provider models add /path/to/local/model --name "your-namespace/model-name"

# Backup and restore
routstr-provider backup --output backup.zip
routstr-provider restore --input backup.zip

# Network testing
routstr-provider network test --target user123 --simulate-load

API Management

Configure the admin API for your provider:

"admin": {
  "api": {
    "enabled": true,
    "port": 8080,
    "authToken": "your-secure-token",
    "allowedIPs": ["127.0.0.1", "192.168.1.0/24"],
    "tlsEnabled": true
  }
}

Custom Templates

Create custom templates for provider configurations:

# Save current configuration as template
routstr-provider template save --name "high-performance"

# Apply template
routstr-provider template apply --name "high-performance"

# Share template (exports a shareable JSON file)
routstr-provider template export --name "high-performance" --output template.json

Best Practices

  1. Start Small: Begin with basic settings and gradually enable advanced features
  2. Monitor Performance: Keep track of key metrics before and after changes
  3. Test Thoroughly: Test advanced features in a staging environment first
  4. Incremental Optimization: Make one change at a time and measure impact
  5. Regular Updates: Keep your models and software updated

Next Steps